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PROTEINS 

The present invention relates to proteins derived from Streptococcus pneumoniae, 
nucleic acid molecules encoding such proteins, the use of the nucleic acid and/or 
proteins as antigens/immunogens and in detection/diagnosis, as well as methods for 
screening the proteins/nucleic acid sequences as potential anti-microbial targets. 

Streptococcus pneumoniae, commonly referred to as the pneumococcus, is an 
important pathogenic organism. The continuing significance of Streptoccocus 
pneumoniae infections in relation to human disease in developing and developed 
countries has been authoritatively reviewed (Fiber, G.R., Science, 265: 1385-1387 
(1994)). That indicates that on a global scale this organism is believed to be the most 
common bacterial cause of acute respiratory infections, and is estimated to result in 1 
million childhood deaths each year, mostly in developing countries (Stansfield, S.K., 
Pediatr. Infect. Dis., 6: 622 (1987)). In the USA it has been suggested (Breiman et al, 
Arch Intern. Med., 150: 1401 (1990)) that the pneumococcus is still the most common 
cause of bacterial pneumonia, and that disease rates are particularly high in young 
children, in the elderly, and in patients with predisposing conditions such as asplenia, 
heart, lung and kidney disease, diabetes, alcoholism, or with immunosupressive 
disorders, especially AIDS. These groups are at higher risk of pneumococcal 
septicaemia and hence meningitis and therefore have a greater risk of dying from 
pneumococcal infection. The pneumococcus is also the leading cause of otitis media 
and sinusitis, which remain prevalent infections in children in developed countries, 
and which incur substantial costs. 

The need for effective preventative strategies against pneumococcal infection is 
highlighted by the recent emergence of penicillin-resistant pneumococci. It has been 
reported that 6.6% of pneumoccal isolates in 13 US hospitals in 12 states were found 
to be resistant to penicillin and some isolates were also resistant to other antibiotics 
including third generation cyclosporins (Schappert, S.M., Vital and Health Statistics of 
the Centres for Disease Control/National Centre for Health Statistics, 214:1 (1992)). 



The rates of penicillin resistance can be higher (up to 20%) in some hospitals 
(Breiman et al 9 J. Am. Med. Assoc., 271: 1831 (1994)). Since the development of 
penicillin resistance among pneumococci is both recent and sudden, coming after 
decades during which penicillin remained an effective treatment, these findings are 
regarded as alarming. 

For the reasons given above, there are therefore compelling grounds for considering 
improvements in the means of preventing, controlling, diagnosing or treating 
pneumococcal diseases. 

Various approaches have been taken in order to provide vaccines for the prevention of 
pneumococcal infections. Difficulties arise for instance in view of the variety of 
serotypes (at least 90) based on the structure of the polysaccharide capsule 
surrounding the organism. Vaccines against individual serotypes are not effective 
against other serotypes and this means that vaccines must include polysaccharide 
antigens from a whole range of serotypes in order to be effective in a majority of 
cases. An additional problem arises because it has been found that the capsular 
polysaccharides (each of which determines the serotype and is the major protective 
antigen) when purified and used as a vaccine do not reliably induce protective 
antibody responses in children under two years of age, the age group which suffers the 
highest incidence of invasive pneumococcal infection and meningitis. 

A modification of the approach using capsule antigens relies on conjugating the 
polysaccharide to a protein in order to derive an enhanced immune response, 
particularly by giving the response T-cell dependent character. This approach has been 
used in the development of a vaccine against Haemophilus influenzae, for instance. 
There are, however, issues of cost concerning both the multi-polysaccharide vaccines 
and those based on conjugates. 

A third approach is to look for other antigenic components which offer the potential to 



3 



be vaccine candidates. This is the basis of the present invention. Using a specially 
developed bacterial expression system, we have been able to identify a group of 
protein antigens from pneomococcus which are associated with the bacterial envelope 
or which are secreted. 

5 

Thus, in a first aspect the present invention provides a Streptococcus pneumoniae 
protein or polypeptide having a sequence selected from those shown in table 1 . 

In a second aspect, the present invention provides a Streptococcus pneumoniae protein 
10 or polypeptide having a sequence selected from those shown in table 2. 

A protein or polypeptide of the present invention may be provided in substantially pure 
form. For example, it may be provided in a form which is substantially free of other 
proteins. 

15 

As discussed herein, the proteins and polypeptides of the invention are useful as antigenic 
material. Such material can be "antigenic" and/or "immunogenic". Generally, "antigenic" 
is taken to mean that the protein or polypeptide is capable of being used to raise 
antibodies or indeed is capable of inducing an antibody response in a subject. 
20 "Immunogenic" is taken to mean that the protein or polypeptide is capable of eliciting a 
protective immune response in a subject. Thus, in the latter case, the protein or 
polypeptide may be capable of not only generating an antibody response but, in addition, 
a non-antibody based immune response. 

25 The skilled person will appreciate that homologues or derivatives of the proteins or 
polypeptides of the invention will also find use in the context of the present invention, ie 
as antigenic/immunogenic material. Thus, for instance proteins or polypeptides which 
include one or more additions, deletions, substitutions or the like are encompassed by the 
present invention. In addition, it may be possible to replace one amino acid with another 

30 of similar ct type". For instance replacing one hydrophobic amino acid with another. 
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One can use a program such as the CLUSTAL program to compare amino acid 
sequences. This program compares amino acid sequences and finds the optimal 
alignment by inserting spaces in either sequence as appropriate. It is possible to calculate 
amino acid identity or similarity (identity plus conservation of amino acid type) for an 
5 optimal alignment. A program like BLASTx will align the longest stretch of similar 
sequences and assign a value to the fit. It is thus possible to obtain a comparison where 
several regions of similarity are found, each having a different score. Both types of 
identity analysis are contemplated in the present invention. \ 

10 In the case of homologues and derivatives, the degree of identity with a protein or 
polypeptide as described herein is less important than that the homologue or derivative 
should retain the antigenicity or immunogenicity of the original protein or polypeptide. 
However, suitably, homologues or derivatives having at least 60% similarity (as 
discussed above) with the proteins or polypeptides described herein are provided. 

15 Preferably, homologues or derivatives having at least 70% similarity, more preferably at 
least 80% similarity are provided. Most preferably, homologues or derivatives having at 
least 90% or even 95% similarity are provided. 

In an alternative approach, the homologues or derivatives could be fusion proteins, 
20 incorporating moieties which render purification easier, for example by effectively 
tagging the desired protein or polypeptide. It may be necessary to remove the "tag" or it 
may be the case that the fusion protein itself retains sufficient antigenicity to be useful. 

In an additional aspect of the invention there are provided antigenic/immunogenic 
25 fragments of the proteins or polypeptides of the invention, or of homologues or 
derivatives thereof 

For fragments of the proteins or polypeptides described herein, or of homologues or 
derivatives thereof, the situation is slightly different. It is well known that is possible to 
30 screen an antigenic protein or polypeptide to identify epitopic regions, ie those regions 



which are responsible for the protein or polypeptide's antigenicity or immunogenicity. 
Methods for carrying out such screening are well known in the art. Thus, the fragments 
of the present invention should include one or more such epitopic regions or be 
sufficiently similar to such regions to retain their antigenic/immunogenic properties. 
Thus, for fragments according to the present invention the degree of identity is perhaps 
irrelevant, since they may be 100% identical to a particular part of a protein or 
polypeptide, homologue or derivative as described herein. The key issue, once again, is 
that the fragment retains the antigenic/immunogenic properties. 

Thus, what is important for homologues, derivatives and fragments is that they possess at 
least a degree of the antigenicity/immunogenicity of the protein or polypeptide from 
which they are derived. 

Gene cloning techniques may be used to provide a protein of the invention in 
substantially pure form. These techniques are disclosed, for example, in J. Sambrook et 
a! Molecular Cloning 2nd Edition, Cold Spring Harbor Laboratory Press (1989). Thus, 
in a third aspect, the present invention provides a nucleic acid molecule comprising or 
consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 1 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 



(iv) a sequence which has substantial identity with any of those of (i), (ii) and 
(iii); 



(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 1 . 

In a fourth aspect the present invention provides a nucleic acid molecule comprising or 
consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 2 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which has substantial identity with any of those of (i), (ii) and 
(iii); or 

(v) a sequence which codes for a homologue, derivative or fragment of a 
protein as defined in Table 2. 

The nucleic acid molecules of the invention may include a plurality of such sequences, 
and/or fragments. The skilled person will appreciate that the present invention can 
include novel variants of those particular novel nucleic acid molecules which are 
exemplified herein. Such variants are encompassed by the present invention. These may 
occur in nature, for example because of strain variation. For example, additions, 
substitutions and/or deletions are included. In addition, and particularly when utilising 
microbial expression systems, one may wish to engineer the nucleic acid sequence by 
making use of known preferred codon usage in the particular organism being used for 
expression. Thus, synthetic or non-naturally occurring variants are also included within 
the scope of the invention. 
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The term "RNA equivalent" when used above indicates that a given RNA molecule has a 
sequence which is complementary to that of a given DNA molecule (allowing for the fact 
that in RNA "U" replaces "T" in the genetic code). 

5 When comparing nucleic acid sequences for the purposes of determining the degree of 
homology or identity one can use programs such as BESTFIT and GAP (both from the 
Wisconsin Genetics Computer Group (GCG) software package) BESTFIT, for example, 
compares two sequences and produces an optimal alignment of the most similar r , 
segments. GAP enables sequences to be aligned along their whole length and finds the 
10 optimal alignment by inserting spaces in either sequence as appropriate. Suitably, in the 
context of the present invention when discussing identity of nucleic acid sequences, the 
comparison is made by alignment of the sequences along their whole length. 

Preferably, sequences which have substantial identity have at least 50% sequence 
1 5 identity, desirably at least 75% sequence identity and more desirably at least 90 or at least 
95% sequence identity with said sequences. In some cases the sequence identity may be 
99% or above. 

Desirably, the term "substantial identity" indicates that said sequence has a greater degree 
20 of identity with any of the sequences described herein than with prior art nucleic acid 
sequences. 

It should however be noted that where a nucleic acid sequence of the present invention 
codes for at least part of a novel gene product the present invention includes within its 
25 scope all possible sequence coding for the gene product or for a novel part thereof. 

The nucleic acid molecule may be in isolated or recombinant form. It may be 
incorporated into a vector and the vector may be incorporated into a host. Such vectors 
and suitable hosts form yet further aspects of the present invention. 

30 
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Therefore, for example, by using probes based upon the nucleic acid sequences provided 
herein, genes in Streptococcus pneumoniae can be identified. They can then be excised 
using restriction enzymes and cloned into a vector. The vector can be introduced into a 
suitable host for expression. 

Nucleic acid molecules of the present invention may be obtained from S.pneumoniae by 
the use of appropriate probes complementary to part of the sequences of the nucleic acid 
molecules. Restriction enzymes or sonication techniques can be used to obtain 
appropriately sized fragments for probing. 

Alternatively PGR techniques may be used to amplify a desired nucleic acid sequence. 
Thus the sequence data provided herein can be used to design two primers for use in PCR 
so that a desired sequence, including whole genes or fragments thereof, can be targeted 
and then amplified to a high degree. 

Typically primers will be at least 15-25 nucleotides long. 

As a further alternative chemical synthesis may be used. This may be automated. 
Relatively short sequences may be chemically synthesised and ligated together to provide 
a longer sequence. 

There is another group of proteins from S.pneumoniae which have been identified 
using the bacterial expression system described herein. These are known proteins from 
S.pneumoniae, which have not previously been identified as antigenic proteins. The 
amino acid sequences of this group of proteins, together with DNA sequences coding 
for them are shown in Table 3. These proteins, or homologues, derivatives and/or 
fragments thereof also find use as antigens/immunogens. Thus, in another aspect the 
present invention provides the use of a protein or polypeptide having a sequence 
selected from those shown in Tables 1-3, or homologues, derivatives and/or fragments 
thereof, as an immunogen/antigen. 



A 
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In yet a further aspect the present invention provides an immunogenic/antigenic 
composition comprising one or more proteins or polypeptides selected from those 
whose sequences are shown in Tables 1-3, or homologues or derivatives thereof, 
5 and/or fragments of any of these. In preferred embodiments, the 

immunogenic/antigenic composition is a vaccine or is for use in a diagnostic assay. 

In the case of vaccines suitable additional excipients, diluents, adjuvants or the like 
may be included. Numerous examples of these are well known in the art. 

10 

It is also possible to utilise the nucleic acid sequences shown in Tables 1-3 in the 
preparation of so-called DNA vaccines. Thus, the invention also provides a vaccine 
composition comprising one or more nucleic acid sequences as defined herein. DNA 
vaccines are described in the art (see for instance, Donnelly et al , Ann. Rev. 
15 Immunol., 15:617-648 (1997)) and the skilled person can use such art described 
techniques to produce and use DNA vaccines according to the present invention. 

As already discussed herein the proteins or polypeptides described herein, their 
homologues or derivatives, and/or fragments of any of these, can be used in methods 

20 of detecting/diagnosing S.pneumoniae. Such methods can be based on the detection of 
antibodies against such proteins which may be present in a subject. Therefore the 
present invention provides a method for the detection/diagnosis of S.pneumoniae 
which comprises the step of bringing into contact a sample to be tested with at least 
one protein, or homologue, derivative or fragment thereof, as described herein. 

25 Suitably, the sample is a biological sample, such as a tissue sample or a sample of 
blood or saliva obtained from a subject to be tested. 

In an alternative approach, the proteins described herein, or homologues, derivatives 
and/or fragments thereof, can be used to raise antibodies, which in turn can be used to 
30 detect the antigens, and hence S.pneumoniae. Such antibodies form another aspect of 
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the invention. Antibodies within the scope of the present invention may be monoclonal 
or polyclonal. 

Polyclonal antibodies can be raised by stimulating their production in a suitable animal 
host (e.g. a mouse, rat, guinea pig, rabbit, sheep, goat or monkey) when a protein as 
described herein, or a homologue, derivative or fragment thereof, is injected into the 
animal. If desired, an adjuvant may be administered together with the protein. Well- 
known adjuvants include Freund's adjuvant (complete and incomplete) and aluminium 
hydroxide. The antibodies can then be purified by virtue of their binding to a protein as 
described herein. 

Monoclonal antibodies can be produced from hybridomas. These can be formed by 
fusing myeloma cells and spleen cells which produce the desired antibody in order to 
form an immortal cell line. Thus the well-known Kohler & Milstein technique {Nature 
256 (1975)) or subsequent variations upon this technique can be used. 

Techniques for producing monoclonal and polyclonal antibodies that bind to a particular 
polypeptide/protein are now well developed in the art. They are discussed in standard 
immunology textbooks, for example in Roitt etal, Immunology second edition (1989), 
Churchill Livingstone, London. 

In addition to whole antibodies, the present invention includes derivatives thereof which 
are capable of binding to proteins etc as described herein. Thus the present invention 
includes antibody fragments and synthetic constructs. Examples of antibody fragments 
and synthetic constructs are given by Dougall et al in Tibtech 12 372-379 (September 
1994). 

Antibody fragments include, for example, Fab, F(ab')2 and Fv fragments. Fab fragments 
(These are discussed in Roitt et al [supra] ). Fv fragments can be modified to produce a 
synthetic construct known as a single chain Fv (scFv) molecule. This includes a peptide 
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linker covalently joining V h and Vi regions, which contributes to the stability of the 
molecule. Other synthetic constructs that can be used include CDR peptides. These are 
synthetic peptides comprising antigen-binding determinants. Peptide mimetics may also 
be used. These molecules are usually conformationaUy restricted organic rings that 
mimic the structure of a CDR loop and that include antigen-interactive side chains. 

Synthetic constructs, include chimaeric molecules. Thus, for example, humanised (or 
primatised) antibodies or derivatives thereof are within the scope of the present invention. 
An example of a humanised antibody is an antibody having human framework regions, 
but rodent hypervariable regions. Ways of producing chimaeric antibodies are discussed 
for example by Morrison et al in PNAS, 81, 6851-6855 (1984) and by Takeda et al in 
Nature. 314, 452-454 (1985). 

Synthetic constructs also include molecules comprising an additional moiety that 
provides the molecule with some desirable property in addition to antigen binding. For 
example the moiety may be a label (e.g. a fluorescent or radioactive label). Alternatively, 
it may be a pharmaceutically active agent. 

Antibodies, or derivatives thereof, find use in detection/diagnosis s>f,S.pneumoniae. Thus, 
in another aspect the present invention provides a method for the detection/diagnosis of 
S.pneumoniae which comprises the step of bringing into contact a sample to be tested and 
antibodies capable of binding to one or more proteins described herein, or to 
homologues, derivatives and/or fragments thereof. 

In addition, so-called " Affibodies" may be utilised. These are binding proteins 
selected from combinatorial libraries of an alpha-helical bacterial receptor domain 
.(Nord etcii, y ^: Thus,. Small protein domains, capable of specific bmdmg ttf differeht- : 
target proteins pan be selected using combinatorial approaches-. 
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It will also be clear that the nucleic acid sequences described herein may be used to 
detect/diagnose S. pneumoniae. Thus, in yet a further aspect, the present invention 
provides a method for the detection/ 'diagnosis of S. pneumoniae which comprises the 
step of bringing into contact a sample tc bs tasted with at least one nucleic acid - 
5 sequence as described herein. Suitably, the sample is a biological sample, such as a 

tissue sample or a sample of blood or saliva obtained from a subject to be tested. Such 
samples may be pre-treated before being used in the methods of the invention. Thus, 
for example, a sample may be treated to extract DNA. Then, DNA probes based on the 
nucleic acid sequences described herein (ie usually fragments of such sequences) may 
10 be used to detect nucleic acid from S.pneumoniae. 

In additional aspects, the present invention provides: 

(a) a method of vaccinating a subject against S.pneumoniae which comprises the 
15 step of administering to a subject a protein or polypeptide of the invention, or a 

derivative, homologue or fragment thereof, or an immunogenic composition of the 
invention; 

(b) a method of vaccinating a subject against S.pneumoniae which comprises the 
20 step of administering to a subject a nucleic acid molecule as defined herein; 

(c) a method for the prophylaxis or treatment of S.pneumoniae infection which 
comprises the step of administering to a subject a protein or polypeptide of the 
invention, or a derivative, homologue or fragment thereof, or an immunogenic 

25 composition of the invention; 

(d) a method for the prophylaxis or treatment of S,pneumoniae infection which 
comprises the step of administering to a subject a nucleic acid molecule as defined 

herein; 

30 
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(e) a kit for use in detecting/diagnosing S. pneumoniae infection comprising one or 
more proteins or polypeptides of the invention, or homologues, derivatives or 
fragments thereof, or an antigenic composition of the invention; and 

(f) a kit for use in detecting/diagnosing S.pheumoniae infection comprising one or 
more nucleic acid molecules as defined herein. 

Given that we have identified a group of important proteins, such proteins are potential 
targets for anti-microbial therapy. It is necessary, however, to determine whether each 
individual protein is essential for the organism's viability. Thus, the present invention 
also provides a method of determining whether a protein or polypeptide as described 
herein represents a potential anti-microbial target which comprises antagonising, 
inhibiting or otherwise interfering with the function or expression of said protein and 
determining whether S.pneumoniae is still viable. 

A suitable method for inactivating the protein is to effect selected gene knockouts, ie 
prevent expression of the protein and determine whether this results in a lethal change. 
Suitable methods for carrying out such gene knockouts are described in Li et al , 
P,KA,S., 94:13251-13256 (1997) and Kolkman et al , 178:3736-374-1-^ - 

(1996). 

In a final aspect the present invention provides the use of an agent capable of 
antagonising, inhibiting or otherwise interfering with the function or expression of a 
protein or polypeptide of the invention in the manufacture of a medicament for use in 
the treatment or prophylaxis of S.pneumoniae infection. 

As mentioned 1 above, we have used a bacter ial expressibn~sy stem as^a means - of- 
jdentifying those proteins which are surface associated^ secreted or exported and thus, , 
• would find-use. as antigens, ' .'..v mo. .:l ... ' - 



The information necessary for the secretion/export of proteins has been extensively 
studied in bacteria. In the majority of cases, protein export requires a signal peptide 
to be present at the N-terminus of the precursor protein so that it becomes directed to 
the translocation machinery on the cytoplasmic membrane. During or after 
translocation, the signal peptide is removed by a membrane associated signal 
peptidase. Ultimately the localization of the protein (i.e. whether it be secreted, an 
integral membrane protein or attached to the cell wall) is determined by sequences 
other than the leader peptide itself. 

We are specifically interested in surface located or exported proteins as these are 
likely to be antigens for use in vaccines, as diagnostic reagents or as targets for 
therapy with novel chemical entities. We have therefore developed a screening 
vector-system in Lactococcus lactis that permits genes encoding exported proteins to 
be identified and isolated. We provide below a representative example showing how 
given novel surface associated proteins from Streptococcus pneumoniae have been 
identified and characterized. The screening vector incorporates the staphylococcal 
nuclease gene nuc lacking its own export signal as a secretion reporter. 
Staphylococcal nuclease is a naturally secreted heat-stable, monomeric enzyme 
which has been efficiently expressed and secreted in a range Of Gram positive 
bacteria (Shortle, Gene, 22:181-189 (1983); Kovacevic et al., J. Bacteriol., 
162:521-528 (1985); Miller et al., J. Bacteriol., 169:3508-3514 (1987); Liebl et al., 
J. Bacteriol., 174:1854-1861 (1992); Le Loir etal., J. Bacteriol., 176:5135-5139 
(1994); Poquet et al., J. Bacteriol, 180:1904-1912 (1998)). 

Recently, Poquet et al. ((1998), supra) have described a screening vector 
incorporating the nuc gene lacking its own signal leader as a reporter to identify 
exported proteins in Gram positive bacteria, and have applied it to L. lactis. This 
' vector (pFUN) contains the pAMpl replicon which functions in a broad host range 
of Gram-positive bacteria in addition to the ColEl replicon that promotes replication 
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in Escherichia cbli and certain other Gram negative bacteria. Unique cloning sites 
present in the vector can be used to generate transcriptional and translational fusions 
between cloned genomic DNA fragments and the open reading frame of the 
truncated nuc gene devoid of its own signal secretion leader. The nuc gene makes an 
5 ideal reporter gene because the secretion of nuclease can readily be detected using a 
simple and sensitive plate test: Recombinant colonies secreting the nuclease develop 
a pink halo whereas control colonies remain white (Shortle, (1983), supra; Le Loir 
et aL, (1994), supra). 

1 0 Thus, the invention will now be described with reference to the following 

representative example, which provides details of how the proteins, polypeptides and 
nucleic acid sequences described herein identified as antigenic targets. 

We describe herein the construction of three reporter vectors and their use in L. 
15 lactis to identify and isolate genomic DNA fragments from Streptococcus 
pneumoniae encoding secreted or surface associated proteins. 

EXAMPLE 1 

20 (i) Construction of the pTREPl-nuc series of reporter vectors 
(a) Construction of expression plasmid pTREPl 

The pTREPl plasmid is a high-copy number (40-80 per cell) theta-replicatirig gram 
,25; ; .positive plasmid, which is a derivative of Ae^ptREX plasmid which is itself a 
; dmvative'crf the previously published pIL25^ plasmid. pIL25 3 incorporates the 
^ir-*6ro h os t range replicon of pAJ&ipi (Simon and Chdpm^Biochi'mie, 

r'J^^^^^^T{i9^S)) and is hon-mobilisable by the Llactis sex-factor: pIL25 3 also V 
\- J ':lacks the tra function which is necessary for transfer or efficient mobilisation by 
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conjugative parent plasmids exemplified by pIL501. The Enterococcal pAMfil 
replicon has previously been transferred to various species including Streptococcus, 
Lactobacillus and Bacillus species as well as Clostridium acetobutylicum, (Oultram 
and Kiaenhammer, FEMS Microbiological Letters, 27:129-134 (1985); Gibson et al, 
5 {FULL REF NEEDED] 1979; LeBlanc et aL, Proceedings of the National Academy 
of Science USA, 75:3484-3487 (1978)) indicating the potential broad host range 
utility. The pTREPl plasmid represents a constitutive transcription vector. 

The pTREX vector was constructed as follows. An artificial DNA fragment 

10 containing a putative RNA stabilising sequence, a translation initiation region (TIR), 
a multiple cloning site for insertion of the target genes and a transcription terminator 
was created by annealing 2 complementary oligonucleotides and extending with Tfl 
DNA polymerase. The sense and anti-sense oligonucleotides contained the 
recognition sites for Nhel and BamHI at their 5' ends respectively to facilitate 

15 cloning. This fragment was cloned between the Xbal and BamHI sites in 

pUC19NT7, a derivative of pUC19 which contains the T7 expression cassette from 
pLETl (Wells et al , J. AppL BacterioL, 74:629-636 (1993)) cloned between the 
EcoRI and HindlXI sites. The resulting construct was designated pUCLEX. The 
complete expression cassette of pUCLEX was then removed by cutting with Hindlll 

20 and blunting followed by cutting with EcoRI before cloning into EcoRI and SacI 
(blunted) sites of pIL253 to generate the vector pTREX (Wells and Schofield, In 
Current advances in metabolism, genetics and applications-NATO ASI Series, H 
98:37-62 (1996)). The putative RNA stabilising sequence and TIR are derived from 
the Escherichia coli T7 bacteriophage sequence and modified at one nucleotide 

25 position to enhance the complementarity of the Shine Dalgarno (SD) motif to the 
- ribosomal 16s RNA of Lactococcus lactis (Schofield et ah pers. corns: University of 
Cambridge Dept. Pathology.).- - • - -- 
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ALactococcus lactis MG1363 chromosomal DNA fragment exhibiting promoter 
activity which was subsequently designated P7 was cloned between the EcoRI and 
Bglll sites present in the expression cassette, creating pTREX7. This active promoter 
region had been previously isolated using the promoter probe vector pSB292 
5 (Waterfield et al, Gene, 165:9-15 (1995)). The promoter fragment was amplified by 
PCR using the Vent DNA polymerase according to the manufacturer. 

The pTREPl vector was then constructed as follows. An artificial DNA fragment 
which included a transcription terminator, the forward pUC sequencing primer, a 
10 promoter multiple -cloning site region and a universal translation stop sequence was 
created by annealing two overlapping partially complementary synthetic 
oligonucleotides together and extending with sequenase according to manufacturers 
instructions. The sense and anti-sense (pTREPF and pTREPR) oligonucleotides 
contained the recognition sites for EcoRV and BamHI at their 5' ends respectively to 
15 facilitate cloning into pTREX7. The transcription terminator was that of the Bacillus 
penicillinase gene, which has been shown to be effective in Lactococcus (Jos et al., 
Applied and Environmental Microbiology, 50:540-542 (1985)). This was considered 

necessary as expression of target genes in the pTREX vectorsjvas observed to be ^ 

leaky and is: thought to be the result of cryptic promoter activity inthe origin region 
20 (Schbfield et al. pefs. corns. University of Cambridge Dept. Pathology.). The 

■ forward pUC primer sequencing was included to enable direct sequencing of cloned 
■* DNA fragments. The translation stop sequence which encodes a stop codon in 3 
■ " • - different frames was included to prevent translational fusions between vector .genes 
;^ * ~-atid cloned DNA fragments. The pTREXT vector was first digested with^EcpRi and 
-^25^ ^ blunted' usihg\the- 5 ? ' - 3' ;tk)Iyn^^ A 

" ' according^ manufact^^^ bliiht ended " r; 

: ; - ' : pTREX7 -vector was then digested witii Bgl;^ * 
^ - " artificial^DNA fragment derived- from the annealed syh^ \vasx 



then digested with EcoRV and Bam HI and cloned into the EcoRI(blunted)-Bgl II 
digested pTREX7 vector to generate pTREP. A Lactococcus lactis MG1363 
chromosomal promoter designated PI was then cloned between the EcoRI and Bglll 
sites present in the pTREP expression cassette forming pTREPi. This promoter was 
also isolated using the promoter probe vector pSB292 and characterised by 
Waterfield et al., (1995), supra. The PI promoter fragment was originally amplified 
by PGR using vent DNA polymerase according to manufacturers instructions and 
cloned into the pTREX as an EcoRI-BglH DNA fragment. The EcoRI-BgUI PI 
promoter containing fragment was removed from pTREXl by restriction enzyme 
digestion and used for cloning into pTREP (Schofield et al. pers. corns. University 
of Cambridge, Dept. Pathology.). 

(b) PCR amplification of the S. aureu s nuc gene. 

The nucleotide sequence of the S. aureus nuc gene (EMBL database accession 
number V01281) was used to design synthetic oligonucleotide primers for PCR 
amplification. The primers were designed to amplify the mature form of the nuc 
gene designated nucA which is generated by proteolytic cleavage of the N-terminal 
19 to 21 amino acids of the secreted propeptide designated Snase B (Shortle, (1983), 
supra). Three sense primers (nucSl, nucS2 and nucS3, Appendix 1) were designed, 
each one having a blunt-ended restriction endonuclease cleavage site for EcoRV or 
Smal in a different reading frame with respect to the nuc gene. Additionally Bglll 
and BamHI were incorporated at the 5' ends of the sense and anti-sense primers 
respectively to facilitate cloning into BamHI and Bglll cut pTREPi. The sequences 
of all the primers are given in Appendix 1. Three nuc gene DNA fragments encoding 
the mature form of the nuclease gene (NucA) were amplified by PCR using each of 
the sense primers combined with the anti-sense primer described above. The nuc 
gene fragments were amplified by PCR using S. aureus genomic DNA template, 
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Vent ,DNA Polymerase (NEB) and the conditions recommended by the manufacturer. 
An initial denaturation step at 93 °C for 2 min was followed by 30 cycles of 

denaturation at 93 u € for 45 sec, annealing at 50 °C for 45 seconds, and extension at 

73 °C for 1 minute and then a final 5 min extension step at 73 °C. The PCR 
amplified products were purified using a Wizard clean up column (Promega) to 
remove unincorporated nucleotides and primers. 

(c) Construction of the pTREPl-nuc vectors 

The purified nuc gene fragments described in section b were digested with Bgl II and 
BamHI using standard conditions and ligated to BamHI and Bglll cut and 
dephosphorylated pTREPl to generate the pTREPl-nucl, pTREPl-nuc2 and 
pTREPl-nuc3 series of reporter vectors. General molecular biology techniques were 
carried out using the reagents and buffer supplied by the manufacture or using 
standard conditions(Sambrook and Maniatis, (1989), supra). In each of the pTREPl- 
nuc vectors the expression cassette comprises a transcription terminator, lactococcal 
promoter Pl,junigue cloning sites. (Bglll, EcoRV or Smal) followed by the mature 
form of the nuc gene and a second transcription terminator . Note that the sequences 
required for translation and secretion of the nuc gene were deliberately excluded in 
this construction. Such elements can only be provided by appropriately digested 
foreign DNA fragments (representing the target bacterium) which can be cloned into 
the unique restriction sites present immediately upstream of the nuc gene, 

.^possessing a, promoter,. the:pTI^IU=nue-ye^ pFUN yectpr v . v . 

:.described by Poquet et al (1998), supra, which w^s.-uged to ; identify - - > 

^exported proteins by screening directly for Nuc activity^directly irt/«.4agfif«:^s.,the.,-v- 
^pFUlSf ^eeton does ;not contain a prompter mpstreaip of .th^./?wc open ^<^g/r^tne : -^ 
the cloned genomic DNA fragment must also provide the signals for transcription in 



addition to those elements required for translation initiation and secretion of Nuc. 
This limitation may prevent the isolation of genes that are distant from a promoter 
for example genes which are within polycistronic operons. Additionally there can be 
no guarantee that promoters derived from other species of bacteria will oe 
recognised and functional inL. lactis. Certain promoters may be under stringent 
regulation in the natural host but not in L. lactis. In contrast, the presence of the PI 
promoter in the pTREPl-nuc series of vectors ensures that promoterless DNA 
fragments (or DNA fragments containing promoter sequences not active in L. lactis) 
will still be transcribed. 



(d) Screening for secreted proteins in 5. pneu moniae 

Genomic DNA isolated from S. pneumoniae was digested with the restriction 
enzyme Tru9I. This enzyme which recognises the sequence 5' - TTAA -3' was used 
because it cuts A/T rich genomes efficiently and can generate random genomic 
DNA fragments within the preferred size range (usually averaging 0.5 - 1.0 kb). 
This size range was preferred because there is an increased probability that the PI 
promoter can be utilised to transcribe a novel gene sequence. However, the PI 
promoter may not be necessary in all cases as it is possible that many Streptococcal 
promoters are recognised inL. lactis. DNA fragments of different size ranges were 
purified from partial Tru9I digests of S. pneumoniae genomic DNA. As the Tru 91 
restriction enzyme generates staggered ends the DNA fragments had to be made 
blunt ended before ligation to the EcoRV or Smal cut pTREPl-nuc vectors. This was 
achieved by the partial fill-in enzyme reaction using the 5 '-3' polymerase activity of 
Klenow enzyme. Briefly Tru9I digested DNA was dissolved in a solution (usually 
between 10-20 )A in total) supplemented with T4 DNA ligase buffer (New England 
Biolabs; NEB) (IX) and 33 ;iM of each of the required dNTPs, in this case dATP 
• and dTTP. KIsnow enzyme was added (1 unit Klenow enzyme (NEB) per ixg of 
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DNA) and the reaction incubated at 25 °C for 15 minutes. The reaction was stopped 
by incubating the mix at 75°C for 20 minutes. EcoRV or Smal digested pTREP-nuc 
plasmid DNA was then added (usually between 200-400 ng). The mix was then 
supplemented with 400 units of T4 DNA ligase (NEB) and T4 DNA ligase buffer 
(IX) and incubated overnight at 16°C. The ligation mix was precipitated directly in 
100% Ethanol and 1/10 volume of 3M sodium acetate (pH 5.2) and used to 
transform L. lactis MG1363 (Gasson, 1983). Alternatively, the gene cloning site of 
the pTREP-nuc vectors also contains a Bglll site which can be used to clone for 
example Sau3AI digested genomic DNA fragments. 

L. lactis transformant colonies were grown on brain heart infusion agar and nuclease 
secreting (Nuc+) clones were detected by a toluidine blue-DNA-agar overlay (0.05 
M Tris pH 9.0, 10 g of agar per litre, 10 g of NaCl per liter, 0.1 mM CaC12, 0.03% 
wt/vol. salmon sperm DNA and 90 mg of Toluidine blue O dye) essentially as 
described by Shortle, 1983, supra and Le Loir et aL, 1994, supra). The plates were 
then incubated at 37°C for up to 2 hours. Nuclease secreting clones develop an easily 
identifiable pink halo. Plasmid DNA was isolated from Nuc + recombinant L. lactis 
clones and DNA inserts were sequenced on one strand using the NucSeq sequencing 
primer descfibed in Appendix - !, "which sequences directly through ttarDN A insert. ~ 

Isolation of Genes Encoding Exported Proteins from 
S. pneumoniae 

A large number of . gene sequences putatively encoding exported proteins in S. 

pneumoniae have been identified using the nuclease screening system. These have 
rnow^Deea further/ The sequenpes^ .... 

;;§cfeening systenrhave been analysed-using-a number of ^^arap^ers? ^; 
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1. Ali putative surface proteins were analysed for leader/signal peptide 
sequences using the software programs Sequencher (Gene Codes Corporation) and 
DNA Strider (Marck, Nucleic Acids Res., 16:1829-1836 (1988)). Bacterial signal 
peptide sequences share a common design. They are characterised by a short 
positively charged N-terminus (N region) immediately preceding a stretch of 
hydrophobic residues (central portion-h region) followed by a more polar C-terminal 
portion which contains the cleavage site (c-region). Computer software is available 
which allows hydropathy profiling of putative proteins and which can readily identify 
the very distinctive hydrophobic portion (h-region) typical of leader peptide 
sequences. In addition, the sequences were checked for the presence of or absence of 
a potential ribosomal binding site (Shine-Dalgarno motif) required for translation 
initiation of the putative nuc reporter fusion protein. 

2. All putative surface protein sequences were also matched with all of the 
protein/DNA sequences using the publicly databases [OWL-proteins inclusive of 
SwissProt and GenBank translations]. This allows us to identify sequences similar to 
known genes or homologues of genes for which some function has been ascribed. 
Hence it has been possible to predict a function for some of the genes identified 
using the LEEP system and to unequivocally establish .that the system can be used to 
identify and isolate gene sequences of surface associated proteins. We should also be 
able to confirm that these proteins are indeed surface related and not artifacts. The 
LEEP system has been used to identify novel gene targets for vaccine and therapy. 

3. Some of the genes identified proteins did not possess a typical leader 
peptide sequence and did not show homology with any DNA/protein sequences in 
the database! Indeed these proteins may indicate the primary advantage of our 
screening method, i.e. the isolation of atypical surface-related proteins-- which may 
have been missed in all previously described screening protocols or approaches *, 
based on sequence homology searches . ' . ' - - ' — * 
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In all cases, only partial gene sequences were initially obtained. Full length genes 
were obtained in all cases by reference to the TIGR S.pneumoniae database 
(www@tigr.orgV Thus, by matching the originally obtained partial sequences with 
the database, we were able to identify the full length gene sequences. In this way, as 
described herein, three groups of genes were clearly identified, ie a group of genes 
encoding previously unidentified S.pneumoniae proteins, a second group exhibiting 
some homology with known proteins from a variety of sources and a third group 
which encoded known S.pneumoniae proteins, which were, however, not known as 
antigens. 



Appendix I - Oligonucleotide primers 



nucSl 

Bgl II Eco RV 
5'- c gagatctgatatct cacaaacagataacggcgtaaatag -3 f 

nucS2 

Bgl II Sma I 
5'- gaagatcttccccgggatcacaaacagataacggcgtaaatag -3' 

nucS3 

Bgin EcoRV 
5 1 - cg agatctsatatc catcacaaacagataacggcgtaaatag -3' 



nucR 

Bam HI 

5'- cgggatccttatggacctgaatcagcgttgtc -3' 
NucSeq 

5'- ggatgctttgtttcaggtgtatc -3' 
pTREPF 

5 ' - catgatatcggtacctcaagctcatatcattgtccggcaatggtgtgggcttttmgttttagcggataa 
caatttcacac -3 ' 

pTREPR 

5 ' - gcggatcccccgggcttaattaatgtttaaacactagtcgaagatctcgcgaattctcctgtgtgaaatt 
gttatccgcta -3 ' 

pUCF 

5'- cgccagggttttcccagtcacgac -3' 
Vr 

5'- tcaggggggcggagcctatg -3' 
Vl 

5'- tcgtatgttgtgtggaattgtg -3' 
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V2 

5'- tccggctcgtatgttgtgtggaattg -3' 



■*. riuX.V ..V. .* 
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TABLE 1 



ID4 1200 bp 

5 

AT G A G AAATAT GTG GGTT GTAATC AAGG AA AC CTATCT7C G AC ATGT C G A GT C AT GG AGTTTCTTCTTT ATGG7G A 
TTTCGCCGTTCCTCTTTTTAGGAATCTCTGTAGGAATTGGGCATCTCCAAGGTTCTTCTATGGCTAA 
GTGGCAGTAGTGACAACAGTGCCATCTGTAGCAGAAGGACTGAAGAATGTAAATGGTGTTAACTTCGACTATAAA 
GACGAAGCAAGTGCCAAAGAAGCAATTAAAGAAGAAAAATTAAAAGGTTATTTGACCATTGATCAAG 

10 GTTCTAAAGGCAGTTTATCATGGCGAAACATCGCTTGAAAATGGAATTAAATTTGAGGTTAC 
AACTGCAAAATCAGCTTAATCGTTCAACTGCTTCCTTGTCTCAAGAGCAGGAAAAA 
ATTCACAGAAAAGATTGATGAAGCCAAGGAAAATAAAAAGTTTATTCAAACAATTGCAGCAGGT 
CTTTCTTTATATGATTCTGATTACCTATGCGGGTGTAACAGCTCAGGAAGTTGCCAGTGAAAAAGGCACCAAAA^ 
ATGGAAGTCGTITTTTCTAGCATAAGGGCAAGTCACTATTTC 

15 AACGCATATTGGGATCTATGTTGTAGGTGGTCTGGCTGCCGTTTTGCTCTTTAAAGATTTGCCATTCT^ 

CTGGTATTTTGGATCACTTGGGAGATGCTATCTCACTGAATACCTTGCTCTTTATTTTGATCAGTCTTTTCATGT^ 
GTAGTCTTGGCAGCCTTCCTAGGATCTATGGTTTCTCGTCCTGAGGACTCAGGGAAAGCCTTGTCGCCTTTGATGA 
TTTTGATTATGGGTGGTTTTTTTGGAGTGACAGCTCTAGGTGCAGCTGGTGACAATCTCCTCTTGAAGA 
TATATTCCCTTTATTTCGACCTTCTTTATGCCGTTTCGAACGATTAATGACTATGCGGGGGGAGCA 

20 TTTCACTTGCTATTACAGTGA.TTTTTGCGGTGGTAGCAACAGGATTTATCGGACGCATGTATGCTAGTCTCGTT 
CAAACGGATGATTTAGGGATTTGGAAAACCTTTAAACGTGCCrTATCTTATAAATAG 

MRNMWVVIKETYLRHVESWSFFFMVISPFT^FLGISVGIGHLQ 
ASAKEAIKEEKLKGYLTTOQEDSVLKAVYHGETSLENGH^ 
25 AKENKKFIQTIAAGALGFFLYMILrn'AGVTAQEVASEKGTKmEVVFSSIRASHYF^ 
AAVLLFKDLPFLAQSGILDHLGDAISLNTLLFILISLFMYVVLAAFLGSM^ 
GDNLLLKIGSYIPFISTFFMPFRTINDYAGGAEAWISLArrVIFAVVATGFIGRMYASLVLQTO 

ID5 1125 bp 

30 

CCTGGGAAAGTCTTGAAAATTATGATAGAATGGTGGAAGGAAAAATTCAGGAGAGTAGTAGTGACTCAAAATGTT 
GAAAGTCTTCTCGTATCCATTGTAATCAGTGCATACAATGAAGAAAAATATCTGCCTGGTCTAATTGAAGACTTAA 
AAAATCAAACCTATCCTAAAGAGGATATTGAAATTCTATTTATAAATGCTATGTCCACAGATGGGACCACAGCTAT 
CATTCAGCAATTTATAAAGGAAGATACAGAGTTTAACTCAATTAGATTG 

35 TAGTGGTTTTAACCTGGGAGTTAAACATTCTGTAGGGGACCTTATTTTAAAAA 

GAGACTTTTGTAATGAACAATGTGGCTATTATTCAACAAGGTGAATTTGTCTGTGGGGGGCCTAGACCGACGATTG 
TCGAAGGAAAAGGAAAATGGGCAGAGACCTTGCATCTTGTTGAGGAAAATATGTTTGGCAGTAGCATTGCCAATT 
ATCGAAATAGTTCTGAGGATAGATATGTTTCTTCTATTTITCATGGAATGTA . 
TGGTTTAGT.AAATGAGCAACTTGGCCGAACTGAAGATAATGATA1TCATTATAGAATT 

40 ; ATCCGCT ATAGCCCAAGTATTCTATCTTATCAGTATATTCGACCAACATTCAAGAAAATGCTGCATCAAAAGTATT 
GAAATGGTTTGTGGATTGGCTTGACAAGTCATGTTCAGTTTAAGTGTTTATCATrATTTCACTATGT^ 
TTTGTTTTGAGTCTTGTGTTTAGTCTAGCATTGTrACCGATCACATTCGTATTCATAACTTTACT^ 
TTTCTACTTTTGTCATTACTCACTTTGCTGACTTTATTAAAACATAAAAATGGATT^ 
TTATTTTCCATTCACTTTGCTTATGGCCTTGGGACGATT 

45 ACAAGAGAACAATAATTTATTTGGATAAAATAAGCCAAATAAATCAAAATATGCTATAA 

PGKVLKiMIEW\VTCEKFRRWVTQNVESLLVS^ 
EDTEFNSIRLYNNPKKNQASGFNLGVKHSVGDLILKIDAH^ 
LHLVEENMFGSSIANYRNSSEDRYVSSIFHGMYKREVFQKV 
.50 TFKKMLHQKYSNGLWIGLTSHVQFKCLSLFHYVPCLFVLSLVFSLALLPITFVFITLLLGAYFLLLSLLTL 
LrVMPFlLFSIHFAYGLGTIVGLIRGFKWKKEYKRTIIYLDKISQINQNMLZ 

JDVi 696 bp 

55 ATCATGAAAGAACAAAATACGATAGAAATCGATGTATTTCAATTAGTTAAAAGCTTGTGGAAACGCAAGCTAAT^ 
ATTriAATAGTGGCACTTGTGACAGGTGCGGGGGCTTTTGCATATAGCACTTTTATTGTTA^ 

GTACCACGCGAATTTACGTAGTGAATCGCAATCAAGGAGACAAGCCGGGGTTGACAAATCAGGATTTGCAGGCAG 
GAj\C7^ATCTGGTAAA AGACTACCGTGAGATTATCCTTTCGCAGGATGTTTIGGAGGAAGTTGTTTCTGA 
ACTAGATTTGACGCCAAAAGGTrTGGCTAATAA.AATTAAAGTGACAGTACCAGTTGATACCCGTATTGTCTCTATr 
60- . TC AGTTAATG ATCGAGTFCCT uAAGAGGCAA GCCGTATCGCTAACTCTTTGAGAG AAGTAGCTGCTCAAA AAATTA 
TCAGTATi ACTCGTGTlTCTGACGTGACAACACTGGAGGAGGCAAGGCCGGCGATATCCCCGTCTTCGCCAAATAT 
TA^\ACGCAATACACTAATTGGTTTTTTGGCAGGGGTGAT^ 
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■atactcgtgtgaAacgtccggaagatatcga^ ' 
gtaagttgaaatag 

MMKEQNTIEIDVFQLVKSLW^ 

VKDYREIILSQDVLEEWSDLKLDLTPKGLANKIKVTVPVDTPJVSISVNDRVP 
I^EAKPAISPSSPNIXRNTUGFI^GVIGTSVIVLHLEU^ 

ID 19 555 bp ' 

ATGGTAAAAGTAGCAGTTATATTAGCTCAGGGCTTTGAAGA 

GAGCCAATATCACATGTGATATGGTTGGTTTTGAAGAGCAAGTAACGGGTTCGCATGCAATCCAAGTAAGAGCAG 

ATCATGTCTTTGATGGAGATTTATCAGACTATGATATGATTGTTCTTCCTGGAGGTATGCCTGGTTCTGCACATTTA 

CGTGATAATCAGACCTTGATTCAAGAATTGCAAAGCTTCGAGCAAGAAGGGAAGAAACTAGCAGCCATTTGTGCG 

GCACCAATTGCCCTCAATCAAGCAGAGATATTGAAAAATAAGCGATACACTTGTTATGACGGCGTTCAAGAGCAA 

ATCCTTGATGGTCACTACGTCAAGGAAACAGTAGTGGTAGATGGTCAGTTGACAACCAGTCGGGGTCCTTCAACA 

GCCCTTGCCTTTGCCTACGAGTTGGTGGAGCAACTAGGAGGGGACGCAGAGAGTTTACGAACAGGAATGCTCTAT 

CGAGATGTCTTTGGTAAAAATCAGTAA 

MVKVAVILAQGFEEIEALTWDVLRRANITCDMVGFEEQVTGSHAIQVRADHVFDGDLSDYDMIVLPGGM 

DNQTLIQELQSFEQEGKKLAAICAAPI^^ 
YELVEQLGGDAESLRTGMLYRDVFGKNQZ 

ID27 306 bp 

GTGGTAGGGATGGTAGAACCAAACCTAGAAAGCCTTATAAAAGATCTTTACAATCATGCTCGACATGATTTGAGT 
GAAGATTTAGTTGCTGCTCTCCTAGAGACTACTAAAAAACTGCCTACTACAAATGAGCAATTGCAGGCAGTTCG 
TCTCAGGCCTGGTCAATCGTGAATTGCTCCTAAATCCCAAACATCCAGCACCTGAGTTGCTCAACTTGGCTCGCTT 
TGTCAAAAGAGAAGAAGCCAAGTACAGAGGAACTGCGACTTCTGCGCTTATGTATGAGGAACTCTTTAAAATGCT 

TTGA 

MVGMVEPNLESLIKDLYNHARHDLSEDLVAALLETT^ 
REEAKYRGTATSALMYEELFKMLZ 

ID29 945 bp 

TTGTTCXTAAAAAAGGAAAGAGAGGTAATCAGCATGCGTAAATG 

ACTACCGTTATCGGCTTTATCCTGCTTTTTGTAGGTATCCAATCTGACGGGAtTAAGAGCCTACTTTCCATGTCCAA 
AGAACCTGTCTATGATAGCCGTACGGAAAAGCTAACCTTTGGCAAGGAAGTCGAAAACCTAGAAATTACTCTCCA 

CCAACACACGCTCACCATCACAGACTCTTTCGATGATCAAA 

ATGATCTl^TCACCAATCAGAACGATAGAAGTeTGAGTgTCAGTGATAAGAAACTGTCTGAAACTCCG 
TTCTGGAAITGGTGGGATTCTT^ 

AAAGGGAGAACTCTAAAAGGGATCAACATCTCAGCCAATCGC 

AATGCGACCCTCAATACAAAC AGCTATATCCTCCGAATTGAAGGAAGTCGTATCAAAAAC AGTAAACTCACAACG 

- cccaatAtcgttaatatctttg 

; " aAAATATCC AAGTCC ATGGC AAGGTTGAACTGACTGCC AAAGATTATCTCAG AATCATCCTAG^ 
' GCC AACGAATTAACTGGGAC ATCTC AAGCAACTATGGTTCTATCTTCC AATTC AC AAG AGAAAAGCCTGAATC AA 
GAGGTACGGAA1TAAGCAACCCTTACAAAACTG 

ATAATATTGATCTAATATCCACACCAAGCAGACpTTGA ; 0 t 

/ * MFLKKEREV1SMRKWTKGFLIFGV VTm 

TOSFDD^ 
'^GQ^IINASLENATl-NTNSYI^ 

* l QKESQRINWDISSN^ " ■ ^V^Z'^V . 

-f- m>30 879 bp >•- [' . . * ; -'»-'•;.«>:" • ***** r i ^J/ ,: r\ , r v - •• ^ ' 

"^XGGrtGCAGACAAGGCTGAAG 

GAAGTCCCTCAAGCTGAAGTCGAATTGGAAAGCCAGCAAGAAGAGAAAATTGAAG^^ 

^GAG^CAG^^ 

^aagtcactat^ 

* ^TsikAT cmAGAAAGTCOTATATCC'CC 

A-TTT^GGTCTXGGCTAGTGGAAGCGATCAAATC^ 
AGCCTITCTCrtGCTCATTCTGTTTTCT^ 



ACATATAGCAAGCATTAACAGTCGCTTCCCTGAG 

TAGCGACAACACTCTTCTTCTTTTCATTCCTCTTGGGTAGTITCGTTGTGAGACGATTTATCCACCAGGAAAAGG 

CTGGACGCTAGACAAGGTTCTCCAACAATATAGTCAACTCTTGGCAATTCCAATCTCCTCACTGCTATTGCTAGTT 

TCTTTGCTITCTTTGATAGCCTACGATTTACAGCCCTCTTGTGTGTGA 

MKQEWFESNDFVKTTSKNK^EEQAQEVADKAEETIADLDTPIEKNTQLEEEVPQAEVELESQQEEKIEAPE 
EKKASNSTEEEPDI^KETEKVTIAEESQEALPQQKATTKEPUJSKSLESPTO 
PTSKLETSITHSYTAFLLLXLFSASSFFFSIYfflK^ 
QEKDWTLDKVLQQYSQLLAIPISSLLLLVSLLSLIAYDLQPSCVZ 

ID105 990 bp 

ATGCAACTCGCTTCTTCGGTCTACTCATTGTTCGTCTGGTACAATTTGTTCTTAAAAAAGGAAAGA 
GCATGCGTAAATGGACAAAAGGATTTCTCATCTTTGGTGTGGTGACTACCGT^ 

GGTATCCAATCTGACGGGATTAAGAGCCTACTTTCCATGTCCAAAGAACCTGTCTATGATAGCCGTACGGAAAAG 

CTAACCTTTGGCAAGGAAGTCGAAAACCTAGAAATTACTCTCCACCAACACACGCTCACCATCACAGACTCTTTCG 

ATGATCAAATCCACATTTCTTACCATCCATCTCTTTCTGCTCACCATGATCTTATCACCAATCAGAACGATAGAAC 

TCTGAGTCTCACTGATAAGAAACTGTCTGAAACTCCGTTTCTCTCTTCTGGAATTGGTGGGATTCTTCATATCGCAA 

GTAGCTACTCTAGTCGTTTTGAAGAAGTTATTCTCCGACTACCAAAAGGGAGAACTCTAAAAGGGATCAACATCTC 

AGCCAATCGCGGACAAACCACCATCATAAATGCTAGCCTTGAAAATGCGACCCTCAATACAAACAGCTATATCCT 

CCGAATTGAAGGAAGTCGTATCAAAAACAGTAAACTCACAACGCCCAATATCGTTAATATCTTTGATACAGTTCT^ 

ACAGATAGTCAGCTAGAGTCAACAGAGAATCACTTCCACGCTGAAAATATCCAAGTCCATGGCAAGGTTGAACTG 

ACTGCCAAAGATTATCTCAGAATCATCCTAGACCAGAAAGAAAGCCAACGAATTAACTGGGACATCTCAAGCAAC 

TATGGTTCTATCTTCCAATTCACAAGAGAAAAGCCTGAATCAAGAGGTACGGAATTAAGCAACCCTTACAAAACT 

GAAAAAACCGATGTCAAGGATCAACTCATTGCGAGATCTGATGATAATATTGATCTAATATCCACACCAAGCAGA 

CGTTGA 

MQLASSVYSLFVWYNLFLKKEREVISMRKWTKGF 

EVENLEITLHQHTLTITDSFDDQIHISYHPSLSAHHDLn*NQNDRTLSLTDKKLSETPFl^SGIGGILHIASSYSS 

LPKGRTLKGINISANRGQTTIINASLENATLNTNSYILRffiGSRIK^ 
GKVELTAKDYLRDX.DQKESQRINWDISSNYGSIFQFniEKPESRGTELSNPYKTEK 

ID107-78bp 

ATGATATGTAAAATGAAGCAGGGAGGGAGCAGGGCGTGCTGGGGATGGAGAGTGGGGGAGGGACGCTGCTATTTT 
AATC 

MJCKMKQGGSRACWGWRVGEGRCYFN 



IP109 714 bp 

CGATAAAGAGGCCTTGAGTAATCTCAATTTGCAGATTGAAAATGGAGAGATTATGGGCTTGATTGGTCATAATGG 
GGCTGGAAAATCGACCACTATAAAATCCCTAGTCAGTATCATTTCACCCAGCAGTG 

CAGGAGTTATCGGAAAATCGCTTGGCTATTAAACGAAAGATTGGCTACGTAGCAGACTCGCCTGACTTA'I'l'l'I'iAC 
GCTTAACGGCCAATGAATTTTGGGAATTGATCGCCTCATCCTATGATCTGAGTAGATCTGACTTGGAGGCTAGTCT 

AGCTAGGCTATTGAACGTTTTTGATTTTGCTC 

AGAAXGTCTTrGTCATCGGAGCACTCTTGTCTGATCCCGAT^ 

TCCCCAGGCTGCCTTTGATTTGAAACAGATGATC 

TCATGTGCTAGAGGTGGCAGAGCAAGTCTGTGATCGGATTGCCATTTTGAAAAAGGGGCATTTGATTTATTGTG 
AAGGTAGAGGACTTGAGGAAAGACCACCCAGACCAGTCTTTGGAAAGTATCTACCTTAGTCTTGCTGGTAGAAAA 

GAGGAGGTTGCGGATGCGTCTCAAGGTCATTAA 

- bKEAt^NLQIENGEIMGLIGHNGAGKSTTIKSLVSHSPSSGRILVDGQELSEN 
" WELIASSYDL'SkSDLEASLARLLNVFDFAEN^^ 
r MM^HAQKGKTVLFSTHVLEVAEQVCDRIAILKXGHLIYCGKVEDLRK^ 

ID112 360 bp 

ATGGCTTTGTTTTCAGAGAGAOGAUCAGTACGGAAGACACCAATGGCAAGTCCAATAATGAGACCTATGATGGTT 
. CCG ACGATAGAGATJ:AAAAG AGTGnTACC AGC ACCACGCAAG AGTTGTTGCC AGTTTTC AGAAAGA ATTTTAGC A 
ACTTbGC^AAA GA^ctaCTGCTAGTCTCTTCAGTTGTTGTAGCTTCGGCAGGTI'GTTCCTTGATCATACGATCCAT 
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C^GGCAACTTGGTCATeTTTTGAAATGGTTTC 

GCCCGATAGCGATAGCTGTATCTTCTTCCCCAGTTTTGAAACCAGGTTCTACTTGA 

MALFSERGAVRKTPMASPIMRPMMVPTIEIKRVIPAPRKSCCQFSERILATWLK^ 
FB^SMLALIWLJRi^FLRSPLAXAVSSSPVL.K^GSTZ 



Tin 



TABLE 2 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



ID2 840 bp 

™™. ^-r-r- . attttacatatcaagaaggtactcccttagcttcagcagctttgtcggatg 

t^™ac^ga^ 

xSS^a^^ 



cggt^gaaggacg^t^ 

rAAA^rrTrTGG^GGAATTGATGAATCACTTm 
TGGATAG 



fmeevqlgvpkitafckrladrgvsfkrlpikieefkeslngz 



ID 3 6360 bp 

atta^gg^a^a^aga^gttgttagta^aaatcctgtgatagacaataacacta 

G^G^^A^A^^ 

GA^GATAAAGTTGTCTATATTGCTGAATTTAAAGATA^^ 

ot^g^acWgtttta^ 
?gcc^S?ctcStcagIt^aaa^^ 



^TITATACAAAGGATTTAAAA^ATGCTTTTA 
CTGTAAATTACTACAATAGAGATAATTGGA 

CAGAGCTTCCAG^T^^^ 



gtgtaaagctatggaacatgattaatcctgataaaaaaactgaa^^ 

ata a irmrArr A ATArTATCCAATTGATATGGAAAGTTTTAATTCCAACAAACCGAATGTAGGTGACUAAAAAU 
TA^cSg^^aTt^^^^ 

ta^tg^attXtg^^^^ 

TACAACATCCCTGGATCAAAATCCAGAATTATTTGCTTTCAATAACGAAGGGATCAACGCTCCATCATCAAGIGOI 
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10 



15 



20 



25 



30 



AAQGTC/^TATrTCTATA'AATrrAAATi^AS 
ATT'rATr^AAATATCAAAAGCTTGGCGAGATTGCAGAATCT 

g^aaaI^ga^ 

^L^^^.^.l^r^T^r.^,^, a tt ir.ir.i rTTrr.A A A ATA A AGACTTAAAGAAACTCATTAAAAAGAAATI 



GTCTAAAAAAAGATACAACTGGGGTAGAACATC ATCATCAAGAAAA 1 uAAUAU i 1«. 1 1 ^^???? i ; A 7;rr 

™?g^otg£^^ 

TTATGGTGTTCTAAGTCCGTCTAAAGATGGACACTTTGAAATTC^ 



VTrTr^^A^mAA^TOAACOATAAAGGra^ 



t^a™gctgaatcttaaagacggt^ 

CA^T^GCT^CAGATATGGATGG 

ag^a^tat^gt^^ 
35 tatcctgItggaaIa^™ 

atVa^gattgttgtaaaagattitgcaagaaatacaaccgtaaaa 
gtaagtgaattaaaacctcatagggta^^ 

r . vp A Ar , ATT TTATTITACCTGTTTATAAGGGTGAATTAGAAAAAGGATACCAATTTGATGGTTGG 1 1 ^ io 

^o gt^aTg^ 

: ■ ' . . A AT A n AfrGAGAAAAAGGAGGAAG AAAATAAACCTACTTTTGATGTATCG AAAAAG AAAGAT 



7: i«^^^s^sgsi^^S^^S^ ? v, 

45 '"'"*' tggattgaagaaaaaaaatcaagattaa 1." . . : ' 

----- - : 

, , V, ^'^QNT^lMPAtSWI^KSQYFASPKQQ^^ • . 

ene: 



FTKSKIYGVI^PSKDGHFEILGKISNVSKNAKVYYGNNYKSIEIKATKYDFHSKTMTFDLYANLNDIVDGLAFAGDMRLF 
EFYLRG^D^GFNWELRVNESW^ 

GWElSGFEGKffl3AGYVTWLSKDTFIX?VKCiGEEX££^^ — <s— ^-^-c^- 

TKDVTATVLDKNNISSKSTTNNPNKLPKTGTASGAQTLLAAGIMFIVGIFLGLKKKNQDZ 

H>6S97bp 

CTTGAATTAAATAAAAAACGTCATGCGACTAAGCATTTTACTGATAAGCTTGTTGATCCCAAAGATGTGCGTACGG 
ATGCTGAACTCGCAAAGTT^lGCTTATGGTTCCAATTTTG 

tISga^cggac^gccaaacgtgctcgtaaga^ 

n-rr a atattttatgaaaaatctgccagctgagtttgcccgttac^gtgagcaacaagtcagcoactacctagctl 
tcIatccI^g^gg^^atg^cto 

TTTTOACAAATCA^AAGTTAATGAA^ 

tatacagacgaaaaattggaaccaagctaccgcttgccagtagatgaaatcatcgagaaaagatag 
lelnkkrhatkhftdklvdpkdvrtaieiatlapsahnsqpwkfvvvreknaelakla.ygsnfeqvssa^ 

VNEVLEIEDRFRPELLITVGYTDEKLEPSYRLPVDEIIEKRZ 
ID7 1401 bp 

atgaca^aattgat^acagcagaagtagaaaaacgcaaagaagacctcttggctgacttgtttagcct^gg 

tggtoatcgagaa^g^ctcggaatctttgcccatatggatc^^ 
^actatS^ga^a^ 

gacgaagaatcaggctgggcagacatggactactactitgagcacgtaggacttgcca^ 
tcaccagatx3cix3aatttccaatcatca 

a^taca^g^^ttg^ccgtcttcacagctttacaggtggtttacgtgaaaatatggtacc 

riTrnTTTCAGGTGACTTGGCTGACTTGCAAGCTAAACTAGATGCCTTTGTTGCAGAACACAAACTTAGAGGAGAAC 
TCAATGGCGCAACTTACCTTGCCCTCTTCCTCAGCC^ 

pSa^GGA^A^ 
ACG^ACACCX:CTCACTATGTGCC^^ 

TOnrTTTAAAGGTCATGAACAAGTCATCGGTGGTGGAACCTTTGGTCGCTTGCTAGAACGCGGAGTTpCCTACGGT 
CAGCAATTTATGCCGAAGCTATTTACGAATTGATCAAATAA 

S£SSa?sm?agT^ 

LLN'TfiKQTGFKGIIEQVIGGGTFGRLLFJUjVAYGAMFPD^ 

nTGTATACTA'rTAlYiAAATCAAATATAAAAAAATTTAGTTTATTAA 
A^TTTATGCAGCAA^ATTAATGCTCTGOTGTTGAA 

TC^AATCTACCAAATGATTGTCTGGTGTGGGATAATATTCCTTGACTGGGTAGTGAAAAATTATCAGGTTGAAGTGA 
TCOVAGAfflTTAATrTA G * G ATTC OA AATA G AGTTGCCACAG AC ATCTCTAACTCTACCT ATC A A G AATTTC ATAG 

AATrfAACTAACCAAAATGAAGClTrTrTAAAATCTAGIGAGACTATATTGAATGGATITGATGTGTTAGC.Gr^CT 



TGAATCTTTTATATGTATTGCCTAAGAAAATTAAAGA " 
^TATrTTGCAATAAAAGGAATAGTGAAAATTGGTACTATTGAAGCAATAGGAGCACTAACAGGT 
ASGSrGGT^A^ 



Xaaatat^gctataagtatggagataaagaaatattaaaaaa 

^^AGGTGAAAGTGGAAGCGGGAAATCTACATTATTAAAATr^ 

ggagaattgcgattctgcggggatgatataaaaaaaacctcct 

ATPi aa A4rrT^ATTTGTTTGAAGGTACGATTAGAGATAATATTT^ 
A^GGGAGAraCTCT^ 

tI^a^^GACG^GGGAACTTCTGCTATCGATAGGAGAACTTCG^ 

caaaggattttatttaa 

MYTTTKSNIKKFSLLTIFIVAGQLLLrYAATm^^ 

™pegS 

SAIDRRTSLAIERKILDREDLTVnVTHAPKFELKQYFTKIYQFPKDFIZ 
ID9 705 bp 

ATAACAGTTAAACAGA^ATGG^GAy^aA^ 

caaatat££E^?^^ 



\TTGAAAAACAAGCCCTCCAAACGGCAGAAAAACAAGAAATAGCCCAT 

g^t^a^^^attctaa?aaaaaatatttactcgcagatcatagcaagttc 

taatgtatcaa^tcttgatactattgtttc 

ttaaagtcatcaagccttaa 

" 

ID10 483 bp * ' ■/ . * ' 4 / C 

ATCT^CTTGTTGGCCTAATTTAGAATAA . -.._•'.* 



4 . lez- ^_ '• . . , - .. "... 



TTTTGAGCCAATTTGCCrrTGGCATGAAGATTGCCT 

CGTGACTTGGTCTTGGCAAATCCTGAAGATGTCAAAGC/^ 
GTCAGGTCTTGAAGTTGGCTGAAATCITCAATGCTCAAGATATTTC 

GGGTGACAAGGCGCGTGTCGTTATCATCACACACAAGATTAAi^G^^CA^ , - u*««A- ^ . - . - - 
GA^G^AGGTTTCAGAATTCGACCTCTTGAATACCTrCAAGGTGCTAGGAGAATAA 

QLENVSAELKKVSEFDLLNTFKVLGEZ 



ID16 1725 bp 

"TTTCTACAGTAGTCTCAGAAAGAAAACGGACCAACTGGTTC 

TCGAGGAAGTCTTTGTTGAGCiCTCCAGAGGATATCCA^^ 

TTTTACAAGTCCAAGAATTGACCTTTACCT^CCT 

ACTCAAGGACAAATTCTAGGTATCATCGGGG^CTGGTTCTGGT 

TITATCCAGTAGACAAGGGGAACATTGACCT^ATCAAAATGGACGTAGTCCTCTT 

TTGGATTGCCTATGTACCTCAAAAGGTCG^ 

===== 

AGCTCTI^AAAGCTATTAGAGAAAATTTOCAAACACGA^^ 

GATGGCGGACCAGATTCTCCTGTTGGAAAAAGGTGAGTTGCTAGCTGTTGG^^ 

CAGCCAAGTCTA-iTGTGAAATCAATGCATCCCAACATGGAAAGGAGGACTAG 



MKHLI^YFKPYIK^^ 

SQHGKEDZ 

TP 18 1224 bp r ^ 

AtGA^AtocXCTCGAfcTCAAGAG^ 

GGCTATCTATATAGCCGTTAOTCATGATTATCCCAAT^^ 

GCCTTGGGGCTrGTGAT^'GGTTTTGTGGTCATGCTCTTTA^ 

TATirTAGGCTTGGGAt'mVrGATCTTGCCGATTGTATTITATAATCCAAGCTTAU 
CTCGTGTCATTGTC^^^ 
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— AGGGCAGATTGCCATTGGGAGTGGTGGCTTATTTGGTCAGGGATTTAATGCTTCGAATCTGCTTATCCCAGTTCGA 
GAGTCAGATATGATlTTTACGGTTATTGCAGAAGATTtTGGCT^ , 
C ATGTTG ATTTACCGTATGTTGAAG ATTACTCTTAAATC AAATAAC C AGTTCTAC ACTTATATTTCC ACAGGTTTG A 
TTATGATGTTGCTCTTCCACATCTTTGAGAATATCGGTGCTGTGACTGGACTACTTCCTTTGACGGGGATTCCCTTG 
5 CCTTTCATTTCGCAAGGGGGATCAGCTATTATCAGTAATCTGATTGGTGTTGGTTTGCTTTTATC 
GaCTAATCTAGCTGAAGAAAAGAGCGGAAAAGTCCCATTCAAACGGAAAAAGGT^ 
A 

mkrsldsrvdyslllpvfi^lvigvvaiylwsh^ 
1 0 glmilpiVfynpslvastgaknwvsingitlfqpsefm^ 

VLLALQSDLGTALVFVAJFSGl^LLSGVSWKDIPVFVTAVTGVAGFLAOT 
AOTTTYQQAQGQIAIGSGGLFGQGFNASNLLlPVRESDMUTVIAEDFGn 
STGLMMLLFfflFl^GAVTGLU»LTGIP^ 

15 ID22 987 bp 

ATGGTXjGCTAAGAAAAAAATCTTATTTTTTATGT 

CCATTGTTTCAAATCTGGATCCAGAAAAGTATGATATTGATATTCTTGAAATGGAGCACTTTGACAAGGGATATGA 

ATCTGTTCCAAAGCATGTACGCAT1TTAAAATCCC 

20 TGGAGAATGAGAATTTATTITCCAAGACTGACTCGTCGT^ 

TACCATTATGAATCCACCACTGTTGTTCTCTAAAAGAAGAGAAGTCAAGAAGATATCTTGGATTCATGGAAGTATT 

GAAGAACTTCTTAAGGATAGCTCTAAAAGAGAATCACATAGAAGCCAGTTGGATGCTGCGAATACAATTGTAGGG 
ATTTCAAAAAAGACCAGCAATTCTATCAAGGAAGTTTATCCAGATTATA 

GATATGATTTTCAGACTATTCTAGAAAuAATCTCAAGAGAAGATCGATATCGAGATTGCTCCTCAAAGTATCTGTAC 
25 TATCGGACGGATTGAGGAAAATAAGGGTTCTGACCGTGTAGTGGAAGTGATACGATTATTACACCAAGAGGGAAA 
AAACTATCATCTCTATTTTATCGGGGCTGGTGATATGGAAGAGGAACTGAAAAa^ 
TGAGGACTATGTACATTTCCTTGGTTATCAAAAAAATC^ 

TGTCTAAACAAGAAGGTTTTCCTGGAGTGTATGTGGAGGCCTTGAGTCTGGGACTCCCTTTTATCTCTACGGACGT 
TGGAGGGGGTGAGGAATTATCCCAAGAAGGACGATTTGGACAAATCATTGAGAGCAATCAAGAGGCAGCTCAGGC 
30 GATTACTAATTACATGACTTCTGCCTCAAACTTTGATGTCGATGAGGCTAGCCAATTCATTCAACAATTTAC 
ACAAAACAAATCGAACAAGTAGAAAAACTATTAGAGGAGTAG 

MVAKKKJLFFMWSFSLGGGAEKILSTIVSNLDPEKYDIDILEMEHFDKGYESVPKHVR^ 
RIYFPRLTRRLLVKDDYDVEVSFTIMNPPLLFSKRREVKKISW 

35 EVYPDYTSKLQTIYNGYDFQTILEKSQEKIDlEIAPQSICTIGRffiENKGSDRVVEV 
KKRVKEYGffiDYVHFLGYQKNPYQYLSQTK^ 
EAAQAITNYMTSASNroVDEASQeQQFTlTKQffiQVEKLLEEZ 

EP23 1434 bp 



ATOGAAACTGCATTAAOTAGTGTGATTGTGCCA 



■*■-^": :! -•^tcAGAAXJCAGACCTATGAAAATC ^.^^a* 

' TGATTC AAT€GtTGAAGAAGATGACAGGGTGTC 



- .r =■ iSGATGGGAfGAAGCAGGGTGACGGGGATtATCTC a™^***^- 

'4 5 ; * j > ' c agaggt^ XtOvtg - . 

• v v , atgatUgcgcacagtcaggcaatcaggatgactatW 

* - > : - r; V A ggtgaA^a^ 

■ - GGGGrtGAT^ACGAAGATGCCTATTAGCAT^ 
. > . - ... 'ccCTATrACTAGTATTTGGATAGAGGGGATAGTATTA i , . 

- 30 :v; jA tatctaccaaa.agttttataatgaagw :.* 

* - v ■ - * ATCGTTTTTTAAAAGGGC AT ' 
v ■ - v : ^GCCCTTATTGATAAATATTTG 



^ KSKKLHZ 



ATGAGAATCAAAGAGAAAACCAATAATATTAATG 

ATTCTCAAAGATATAAATITrGCACTTAACAAGGGTGAAATTGTTGGTCTAGCAGGGAGAAATGGAGTT^ 
AGTACGTTGATGAAAATTCTTGTTCAGAATAATCAACCGACTTCAGGTAATATTATAAGCAGTGATAATGTTGGGT 

ATTTAATCGAAGAACCAAAATTATITTTATCTAAAACAGGTTrAGA 

TGTTGACTAC.AATCAAGAAAGATTTAGATGTTTGATCCAAGAGTTAGATTTGACTCAGTCTATTAATAA^ 
AAGACCTATTCTTTGGGTACAAAACAAAAATTAGCTTTGC 

AGATGAACCGACTAATGGTTTAGATATTGAATCATCACAAATAGTTTTAGCGGTTCTAAAAAAATTAGC^ 

GAAAATGTGGGAATTTTAATATCGAGTCATAAATTAGAAGACATTGAA^ 

AGAACGGGCTTTTGACATTTCAAAAAGTAGGAAAAGATAGTCATAATTTCTTGTTTGAGA^ 

TACAGATAGAGACATTTTCATTACCAAACAAGAATTTTGGGATATTGTTTAG 

MRIKEKTNNINGGIKNVSKHYGHSHLKDIN^ 
KLFLSKTGLENLKYLSNLYGVDYNQERFRCL^^ 

ssqivi^vlkkxalhenvgilisshkledieeicervlflenglltfqkvgkdshnflfeiafssatdrdifitkqe™ 

W2S 1704bp 

ATGACTGAATTAGATAAACGTCACCGCAGTAGCATTTATGACAGCATGGTTAAATCACCTAACCGTGCTATGCTTC 

GTGCGACTGGTATGACAGATAAGGACTTTGAAACATCGATTGTGGGAGTGATTTCGACTTGGGCGGAAAATACAC 

CATGTAACATTCACTTGCATGATTTCGGGAAACTGGCTAAAGAAGGTGTCAAATCTGCAGGCGCTTGGCCTGTACA 

GTTTGGAACCATTACCGTAGCGGACGGGATCGCTATGGGAACGCCTGGTATGCGTTTCTCTCTAACATCTCGTGAC 

ATCATCGCGGACTCCATCGAGGCGGCTATGAGTGGTCACAACGTGGATGCCTTCGTCGCTATCGGTGGCTGTGACA 

AGAACATGCCTGGATCTATGATTGCTATTGCTAATATGGATATCCCAGCTATTTTCGCCTATGGTGGAACTATTGC 

ACCGGGAAATCTTGATGGTAAAGATATCGACTTGGTTTCTGTCTTTGAAGGTATCGGAAAATGGAACCACGGTGAC 

ATGACAGCTGAGGACGTGAAACGTCTTGAATGTAATGCCTGCCCTGGCCCTGGTGGTTGTGGTGGTATGTATACTG 

CTAATACCATGGCAACTGCTATCGAAGTTCTAGGGATGAGTTTGCCAGGGTCATCCTCTCACCCAGCTGAATCAGC 

TGATAAGAAAGAAGATATCGAAGCAGCAGGACGTGCTGTTGTTAAGATGTTGGAACTTGGTCTCAAACCATCAGA 

TATCTTGACTCGTGAAGCCTTTGAAGATGCTATCACTGTAACGATGGCTCTCGGTGGTTCTACAAACGCCACTCTT 

CACTTGCTCGCCATTGCCCATGCCGCAAATGTTGACTTGTCACTTGAGGACTTCAATACGATTCAAGAACGTGTGC 

CTCACTTGGCCGACTTGAAACCATCTGGTCAGTATGTCTTCCAAGACCTCTACGAAGTCGGTGGTGTCCCTGCGGT 

TATGAAGTATTTGTTGGCAAATGGTTTCCTTCACGGAGATCGCATCACATGTACTGGTAAGACTGTAGCTGAAAAC 

TTGGCTGACTTTGCAGACTTGACTCCAGGCCAAAAAGTTATCATGCCACTTGAAAATCCAAAACGTGCGGATGGTC 

CGCTTATCATCTTGAACGGGAACCTTGCTCCTGACGGTGCAGTTGCCAAGGTATCAGGTGTTAAAGTGCGTCGTCA 

CGTTGGGCCAGCTAAGGTCTTTGACTCAGAAGAAGATGCGATTCAGGCCGTTCTGACAGATGAAATCGTTGATGG 

CGATGTAGTCGTTGTTCGTTTTGTTGGACCTAAAGGTGGTCCTGGTATGCCTGAGATGCTATCACTTTCTTCAATGA 

TTGTTGGTAAAGGTCAGGGAGATAAGGTGGCCCTCTTGACGGACGGACGTTTCTCTGGTGGTACTTATGGTCTGGT 

TGTTGGACATATCGCTCCTGAAGCTCAGGATGGTGGACCAATTGCCTATCTCCGTACCGGCGATATCGTTACGGTT 

GACCAAGATACCAAAGAAATTTCTATGGCCGTATCCGAAGAAGAACTTGAAAAACGCAAGGCAGAAACAACC^ 

CCACCACTTTACAGCCGTGGTGTCCTCGGTAAATATGCCCACATCGTATCATCTGCTTCACGCGGAGCCGTGACAG 

ACTTCTGGAATATGGACAAGTCAGGTAAjMlAATAA 

MTELDKRHRSSIYDSMVKSPNRAMLI^ 

titvadgiAmgtpgmrfsltsrdiiadsie^^ 

KDmLVSVFEGIGKWNHGDMTAEDVKRLECNACPGPGGCGGM 

AGRAVVKMLELGLKPSDILTREAFEDAITVTMALGGSTNATLHLLAIAHA^ 

VFQDLYEVGGVPAVMKYLLANGFIJ1GDRITCTC 

VAKVSGVKVRIOrVGPAKVFDSEEDAIQAVLTDEIVDGDVVVVRFVGPKGGPGMPEMLSI^SMIVGKGQGDKVALLTD 

GRFSGGTYGLWGHIAPEAQDGGPIAYLRTGDWTVDQDT^ 

SRGAVTDFWNMDKSGKKZ • 

IP26 274bp 

. ATGTTATAATAAAAATAAAGAATTTAAGGAGAAATACAATATGTCAA 

AAACGGTTCGTTACATATTGGTCACGCGGCAGCGCTTTTACCGGGGGATATTCTTGCAAGATACTATCGTGAGAAG 

- GGAGAGGAAGTTTTATAT<nTTCT - • 
AAGTCTGt6Ma6aAA^ . .'^ -TI^ 

CYNK^KEFteKYNMSIFIGGAWpYANGS^ 
EIADF YHKEFNP 

ID28 io6Sbo . : ;.'.' v -. - /. ; '" ■ -■- — — — -y — - 

- Xt<^<5^Ca^ 
C^ATGTTCGTGAAA]\GTTCrACACCGCATGTGGATGAAGTG 
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- ACATTCAGAAGGTGTGGATGCACCGCGCGTCTTGGTCGCTTCTCATATGGACGAAGTTGGTTTTATGGTCAGCGAA 
ATCAAGCCAGATGGTAGCTrCCGTGTCGTAGAAATCGGTGGCTGGAACCCCATGGTGGTTAGCAGCCAACGTTTCA 
AACTCTTGACTCGTGATGGTCATGAAATTCCTGTGATTTCAGGTTCTGTTCCTCCGCATTTGACTCGTGGAAAGGG 
GGGACCAACCATGCCAGCCATTGCCGATATCGTTTTTGATGGTGGTTTTGCGGACAAGGCTGAGGCAGAAAGT^ 
5 GGCATCCGTCCTGGTGATACCATTGTACCAGATAGTTCTGCAATTTTGACAGCCAATGAAAAAAATATCATCTC 

AAGCTTGGGATAACCGCTACGGTGTCCTCATGGTAAGCGAGCTAGCTGAAGCTTTATCGGGTCAAAAACTcGuCA 
ATGAACTCTATCTGGGTTCTAACGTCCAAGAAGAAGTTGGTCTGCGTGGCGCTCATACCTCTACAACCAAGTTTGA 
CCCAGAAGTCTTCCTCGCAGTTGATTGCTCACCAGCAGGTGATGTCTACGGTCjGTCAAGGCAAGATTGGAGATGG . . 

AACCTTGATTCGTITCTATGATCCAGGTCACrTGCTTCTCCCAGGGATGAAG 
10 GAAGCTGGTATCAAGTACCAATACTACTGTGGTAAAGGCGGAACAGATGCAGGTGCAGCTCATCTGAAAAATGGT 
GGTGTCCCATCAACAACTATCGGTGTCTGCGCTCGTTATATCCATTCTCACCAAACCCTCTATGCAATGGATGACT 
" TCCTAGAAGCGC AAGCTTTCTTACAAGCCTTGGTG AAGAAATTGGATCGTTCAACGGTTG ATTTGATTAAAC ATTA 
TTAA 

15 MTTLFSKIKEVTELAAVSGHEAPVRAYLREKLTPHVDEVVTO 

DGTFRVVEIGGWNPMVVSSQRFKLLTRDGHEIPVISGSVPPHLTO 
VPDSSAILTANEKNnSKAWDNRYGVU^SELAE^ 
AGDVYGGQGKIGDGTLIRFYDPGHLLLPGMKDFLLTTAEEAGIKYQ 
HSHQTLYAMDDFLEAQAFLQALVKKLDRSTVDLIKHYZ 

20 

ID31 llS2bp 

ATGGAATTTTCTATGAAATCAGTCAAAGGACTACTCTTTATCATAGCT^ 

GAACACTTCTCCCCAATTCATGATTCCAGGACTAGCTTTAACAAGCCTATCTCTGACTTTTATCCTAGCCACTCGTC 
25 TCCCACTACTAGAAAGCTGGTTTCACAGTTTGGAGAAGGTCTACACCGTCCACAAATTCACAGCCTTTCTCTCAAT 
CATCCTACTAATCTTTCATAACTTTAGTATGGGCGGTTTGTGGGGCTCTCGCTTAGCTGCTCAGTTTGGCAATC 
CCATCTATATCTTTGCCAGCATCATCCTTGTCGCCTATTTAGGCAAATACATCCAATACGAAGCTTGGCGATGGAT 
TCACCGCCTGGTTTACCTAGCCTATATTTTAGGACTCTTTCACATCTACATGATAATGGGCAATC GTCTC CTTACAT 
TTAATCTTCTAAGTTTTCTTGTTGGTAGCTATGCCCTTTTA 1 iCTATATC 

30 AAAAGATTTCCTTCCCCTATCTAGGGAAAATTACCCATCTC^ 

CCATCTTAGCAGACCTTTCAACTATCAATCAGGACAATTTGCCTTrCTAAAGATTTTCCA^ 

GCTCCGCATCCCTTTTCTATCTCAGGAGGTCATGGTCAAACTCTTTACTTTACTGTTAAAACTTCAGGCGACCATAC 
CAAGAATATCTATGATAATCTTCAAGCCGGCAGCAAAGTAACCCTAGACAGAGCTTACGGACACATGATCATAGA 
AGAAGGACGAGAAAATCAGGTTTGGATTGCTGGAGGTATTGGGATCACCCCCTTCATCTCTTACATCCGTGAACAT 

3 5 CCTATTTTAGATAAACAGGTTCACTTCTACTATAGCTTCCGTGGAGATG 

GTAACTATGCTCAGAAAAATCCTAATTTTGAACTCCATCTAATCGACAGTACGAAAGACGGCTATCTTAATTTTG 
ACAAAAAGAAGTGCCCGAACATGCAACCGTCTATATGTGTGGTCCTATTTCTATGATGAAGGCACTTGCCAAACA 

gattaagaaacaaaatccaaaaacagagcatatttac 
40 mefsmksvkgllfiiasfiltllwmntspq™^^ 

- — ^ -TSFSMSgLWGSRUUVQFGNTLA^ * - . . Va--: 

•77, - ^ w . YAtLGU^GFYHFLYQKISFPYLGKira^ - :.. , ^ 

' :. v : \^; FTVKT^ ' " ' r - 

S,: . y.\ YLDLLRNYAQKNPNFE^ - '^v^^^:;^-;.;, vyy : 

/ n>32 900bp ■ :>J;; ; -"V^.v-; ; 'r ■' '.. '• ■'„.', 7:7 : \\ - .. - : 777 7- ■/... ; \\ 

r -'-V - - * ATGACTTTTA^TCAGGCTITG^ ^ ' - - 

• : - : * ' ' .*>. GGGGCXAAAGATTGCCATC ATGAGTGACAAGGCGCAGAC AACGCGCAATAAAATCATGGGAATTTAGACGAGTGA- v : r. 7 . , . . ; 
50 " TAAGGAGCAAATTGTCTTTATGG SS± - : : 

.V- r .-.GTeTGCGTAGAGTAGGeTTCGGGAAG .A 7 . 

/.^^ 

.eeT^GAGGGAAATAAGGTGTCTGGTGTAGTGGAf ATTTTG - . ~ -v > - - ■ 

tcaagaaaaactTjGc - 

- :C^sKGniGKGG^^ ^ " v v — r " ' 



ID33 855bp 

CTGCTTCTTGTTTITACAGAAGGAGGACTTATGCCTC 

AATTGATTATAGGAAAGAAGATTTCGAGTATAGAAATTCGCTACCCCAAGATGA^ 

TTCAAAGGGAATTGCCTAGTCAGATTATCGAGTCAATGGGACGTCGTGGAAAATATTTGC'l"l"l"l"l"iATCTGACAGA 
CAAGGTCTTGATTTCCCATTTGCGGATGGAGGGCAAGTATITTTACTATCCAGACCAAGGACCTGAACGCAAGCAT 
^CC 0 ^TGTTTTC r fTTCATTTTG.AAGATGGTGGCACGCTTCTTTATG 
TCTTGGTGCCTGACCTTTTAGACGTCTACTTTATTTCTAAAAAATTAGG 

TTTACAGGTCTTTCAATCTGCCCTTGCCAAGTCCAAAAAGCCTATCAAATCCCATCTCCTAGACCAGACCTTGGTA 

GCTGGACTTGGCAATATCTATGTGGATGAGGTTCTCTGGCGAGCTCAGGTTCATCCAGCTAGACCTTCCCAGACTT 

TGACAGCAGAAGAAGCGACTGCCATTCATGACCAGACCATTGCTGTTTTGGGCCAGGCTGTTGAAAAAGGTG<^T 

CCACC^TTCGGACTTATACCAATGCCTTTGGGGAAGATGGAAGCATGCAGGACTTTCATCAGGTCTATGATAAGAC 

TGGTCAAGAATGTGTACGCTGTGGTACCATCATTGAGAAAATTCAACTAGGCGGACGTGGAACCCACTTTTGTCCA 

AACTGTCAAAGGAGGGACTGA 

MLLVFTEGGIJ4PELPEVETVCRGLEKI-UGKKISSIEIRYPKMI^^ 

RMEGKYFYYPDQGPERKHAHVFFHFEDGGTLVYEDVRKFGTMELLVPDLLDVYFISKKL 

KSKKPIKSHLLDQTLVAGLGNIYVDEVLWRAQVHPARPSQTLTAEEAT^ 

GSMQDFHQVYDKTGQECVRCGTIIEKIQLGGRGTHFCPNCQRRDZ 

ID34 633bp 

TTGTCCAAACTGTCAAAGGAGGGACTGATGGGAAAAATCATCGGAATCACTGGGGGAATTGCCTCTGGTAAGTCA 
ACTGTGACAAATITTCTAAGACAGCAAGGCTTTCAAGTAGTGGATGCCGACGCAGTCGTCCACCAACTACAG 
CCTGGTGGTCGTCTGTTTGAGGCTCTAGTACACKIACTTTGGGCAAGAAATCATTCTTGAAAACGG^^ 
GCCCTCTCCTAGCTAGTCTCATCTTTTCAAATCCTGATGAACGAGAATGGTCTAAGCAAATTCAAGGGGAGATTAT 

CCGTGAGGAACTGGCTACTTTGAGAGAACAGTTGGCTCAGACAGAAGAGATTTTC 

TTTGAGCAGGACTACAGCGATTGGTTTGCTGAGACTTGGTTGGTCT 

TAATGAAAAGGGACCAGTTGTCCAAAGATGAAGCTGAGTCTCGTC^ 

AAGATTTGGCCAGCCAGGrrCTTGATAATAATCKSCAATCAGAACCAGCTTCTTAATCAAGTGCATATC 
GGGAGGTAGGCAAGATGACAGAGATTAA 

MSKl^KEGLMGKnGITGGIASGKSTVTNFLRQQGFQWDADAWHQLQIO'GGRLFEALVQHF 

SLIFSNPDEREWSKQIQGEnREElATLREQIAQTEEIFFMDIPLLFEQDYSDWFAET^ 

DEAESRLAAQWPLEKKKDLASQVLDNNGNQNQLLNQVHILLEGGRQDDRDZ 

ID35 1269bp 

TTGATAATAATGGCAATCAGAACCAGCTTCTTAATCAAGTGC r 

gauattaactggaaggataatctgcgcatxgcctggtxtggtaattttctgacaG 

1VVCCTTTTATGCCCATCTTCGTGGAAAATCTAGGTGTA 

TTCTGTCTCTGCTATTTCGGCGGCGCTCTTTTCTCCTATTTGGGGTATTCTO 
TGATGATTCGGGCAGGTCTTGCTATGACTATCACTATGGGAGGCTTGGCCTTT 
CTTTCTTCGTTTACTAAACGGTGTATTTGCAGGTTITGTTCCTAATGCAACGGCACTG 
AGGAGAAATCAGGCTCTGCCTTAGGTACTTTGTCTACAGGCGTAGTTGCAGGTACTCTAACTGGT 

TGGCTTTATCGCAGAATTATTTGGCATTCGTACAGTTTTC 

GACTATTTGCTTTATCAAGGAAGATTTTCAACCAGTAGCCAAGGAmAAAGGCTATTCCAACA 

tcggttaaatatccctatcttttgctcaatct 
ccctatVitggctctttatgtacgcgacttagggcagacagagaatc 
gtatgggcttttccagcatgatgagtgcaggagtcatgggcaag 
ggttgtcgcccagttitattcagtcatcatctatct 

ATCnTTTCCTCTTTGGATTGGGAACCGGTGCCTTGATTCCCGGGGTTAATGCCCTACTCAGCAAAATG 
AGCCGGCATTTCGAGGGTCXTTGCCTTCAATCAGGTA 

. cTGCAGTAGCAOGTCAAT^^ 

AACCTGATTCA^/mCGA,\CATTA'riAAAAGTAAAGGAAATCTAG 

MIIMALITSFL/KCISFLKEVG^ 
AALFSPIWGILADKYGRKPMMIRAGLA^^ 
TLSTGVVAGTTTGPHGGFIAELFGJRTVH JJVGSFLH^^^ 

FVIOFSAQSIGPII ALYVRDLGQTENLLFVSGLIVSSMGFSSMMSaGVMGKLGDKVGNHRLLVVAQFYSVIIYLLCANAS 

SPLQLGLYRFLFGLGTGAUPGW^ 

AFSCLFNUQFKTIXKYKE1Z 
ID36 1311bp 



ATGGCCCTACCAACTATTGCCATTGTAGGACGTCCCAATGTTGGGAAATCAACCCTATTTAATCGGATCGCTGGTG 

AGCGAATCTCCATTGTAGAAGATGTCGAAGGAGTGACACGTGACCGTATTTATGCAACGGGTGAGTGGCTCAATC 

GTTCTTTTAGCATGATTGATACAGGAGGAATTGATGATGTCGATGCTCCTTTCATGGAACAAATCAAGCACCAGGC 

AGAAATTGCCATGGAAGAAGCAGATGTTATCGTTTTTGTCGTGTCTGGTAAGGAAGGAATTACTGATGCAGACGA 

ATACGTAGCTCGTAAGCTTTATAAGACCCACAAACCAGTTATCCTCGCAGTCAACAAGGTGGACAACCCTGAGAT 

GAGAAATGATATATATGATTTCTATGCTCTCGGTTTGGGTGAACCATTGCCTATCTCATCTGTCCATGGAATCGGT 

ACAGGGGATGTGCTAGATGCGATCGTAGAAAATCTTCCAAATGAATATGAGGAAGAAAATCCAGATGTCATTAAG 

TTTAGCTTGATTGGTCGTCCTAACGTTGGAAAATCAAGCTTGATCAATGCTATCTTGGGAGAAGACCGTGTTATT^ 

CTAGTCCTGTTGCTGGAACAACTCGTGATGCCATTGATACCCACTTTACAGATACAGATGGTCAAGAGTTTACCAT 

GATTGATACGGCTGGTATGCGTAAGTCTGGTAAGGTTrATGAAAATACTGAGAAATACTCTGTTATGCGTGCCATG 

CGTGCTATTGACCGTTCAGATGTGGTCTTGATGGTCATCAATGCGGAAGAAGGCATTCGTGAGTACGACAAGCGTA 

TCGCAGGATTTGCCCATGAAGCTGGTAAAGGGATGATTATCGTGGTCAACAAGTGGGATACGCTTGAAAAAGATA 

ACCACACTATGAAAAACTGGGAAGAAGATATCCGTGAGCAGTTCCAATACCTGCCTTACGCACCGATTATCTTTGT 

ATCAGCTTTAACCAAGCAACGTCTCCACAAACTTCCTGAGATGATTAAGCAAATCAGCGAAAGTCAAAATACACG 

TATTCCATCAGCTGTCTTGAACGATGTCATCATGGATGCCATTGCCATCAACCCAACACCGACAGACAAAGGAAA 

ACGTCTCAAGATTTTCTATGCGACCCAAGTGGCAACCAAACCACCAACCTTTGT CATCTT TGTCAATG 

CTCATGCACTTTTCTTACCTGCGTTTCTTGGAAAATCAAATCCGCAAGGCCTTTGTTT 

TCTCATCGCAAGAAAACGCAAATAA 

MAIJTIAIVGRPNVGKSTLFNR^^ 
EADV1VFVVSGKEGITDADEYVARIG-YKTHKPV 
NLPOTYEEENPDVIKFSLIGRPNVGKSSLINAILGEDRV^ 
NTEKYSVMRAMRAIDRSDVVLMVIN^ 

YLPYAPIlWSALTKQRLHKLPEMlXQISESQNTnUPSAVLNDVM 
EELMHFSYLRFLENQIRJCAFVFEGTPIHLIARKRKZ 

1D37 714bp 

ATGACAGAAACCATTAAATTGATGAAGGCTCATACTTCAGTGCGCAGGTTTAAAGAGCAAGAuAATTCCCCAAGTA 

GACTTAAATGAGATTirGACAGCAGCCCAGATGGCATCATCTTGGAAGAATTrCCAATCCTACTCT 

TACGAAGTCAAGAGAAGAAAGATGCCTTGTATGAATTGGTACCTCAAGAAGCCATTCGCCAGTCTGCTGTTTTCCT 

TCTCTTTGTCGGAGATTTGAACCGAGCAGAAAAGGGAGCCCGACTTCATACCGACACCTTCCAACCCCAAGGTGT 

GGAAGGTCTCTTGATTAGTTCGGTCGATGCAGCTCTTGCTGGACAAAACGCCTTGTTGGCAGCTGAAAGCTTGGGC 

TATGGTGGTGTGATTATCGGTTTGGTTCGATACAAGTCTGAAGAAGTGGCAGAGCTCTTTAACCTACCTGACTACA 

CCTATTCTGTCTTTGGGATGGCACTGGGTGTGCCAAATCAACATCATGATATGAAACCGAGACTGCCACTAGAGAA 

TGTTGTCTTTGAGGAAGAATACCAAGAACAGTCAACTGAGGCAATCCAAGCTTATGACCGTGTTCAGGCTGACTAT 

GGTGGGGCGCGTGCGACCACAAGCTGGAGTCAGCGCCTAGCAGAACAGTTTGGTCAAGCTGAACCAAGCTCAACT 

AGAAAAAATCTTGAACAGAAGAAATTATTGTAG 

' MTTsTIKI^kAHTSVRRFKEQ 
-GDLNRAEKGARLHTDTFQPQGVEGLLK 
' ^ l^GVPNQHHDMKPRtPLENW 

n>38 729bp . t ; ./j . • ;• * -> 

'ATGACAGAAATTAGACTAGAGCACGTCAGTTATGCCTATGGTCAGGAGA 
' GTGACTTCAGGCGAXGTGGTTTCCATCCTAGGCCCAAGTGGTGTT 
" GG A TTTTAGAAGTrCAGTCAGGGA.GAATTGTCCTTGATO 
vGCAAAAGGATCTGCTC^ 

^rAAGGCAGAAGGTATCTGCCGAGCGGATAAAA 
->GATGAAG^^ 

' gatt^ctK " " ■ — " " r ~ :: :.^*Z' :^ • .J?*: : ^i 

' MTEIRLEHVSYAYGQERILm 

- " eh&wlgnhlplliqk^ 

: -I^TKMELhA^^ 
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ATGAACTATTCAAAAGCATTGAATGAATGTATCGAAAGTGCCTACATGGTTGCTGGACATTTTGGAGCTCG^ATC 

^Sg^acIgTgcaS 
?tcS™c^1?gSg?ca^ 

g^aIgtcag^Sa^^^^ 
aa?a!2StctcSg^tgg^ 

A^SS^A^SSS!^ 



GTGGAAACAGGCAGCCCAGCTAATCGCA^^ 
S^^A^CArc^ 

£^actca£SaSSc^^ 

^^a^atJ^agg™ 
aacitiaattaqccaaggc^^ 



TATATGGAQAAATnXX^GCTAOTCQTCnxa^ 

cI™cI£aTg1^^^^ 

ATTCCTTTTGACCAGGAAAATATC 

AA^PG^GATgI^^ 
AnrrT^A^TGGCAAG^^TGACTGAAAAAGG 

A^J^OTCTC^^ 

TGATATTGCATAA 

RTOEKVVFHSL^SDHMQEVVKIMVKPLVASLTEKGIDLK^ 
KGDLVAGSTLKIGVKAGQLKFDIAZ 

ID40 1008bp 

iTr . AT A A AA' , ATGGAAAG" T 'GTTrT TA ^VCGCTTGTAACAGCTCTTGTAGCTGTTGTGCTTGTGGCCTGTGGTCAAG 

g~aactS^^?a^^ 

- cISSgIaGAA^^ 

a. a t^p a ^> a'at.^t a \ rrr AnT^r 4 44 a q a r'TTGGTTGGT AAG AAATATGGG AC ATGG AATCj ACUUAAU 1 o a 
Ar^TOC^ATGTTG AAAAGCTTG^G^AGAAT^ 

r?TA^ ATC^Igg5S5aGATC?tAA?^ ,TGTACTTG^GAeTATGTCAAGGAGTITGACTACTATTCACC^G 



GCTACCA^ 

^gcSggtgg^gc™ 

AAGGCrirACCAACGAJV^GTGAAATAA . 
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MKKTWKVFLTLVTALVAVV 

PEESSSDLVINGKAPFAVYFQDWAKKLEKGAG 

KTLVESQGGDFEKVEKVPNNDSNSITPI^^ 

YLKDNKJEEARKVIQAIKKGYQYAMEHPEEAADILIK^ 

KWDKENGILKEDLTDKGFTNEFVKZ 

ID41 762bp 

TTGATGAGAAACTTGAGAAGTATACTGAGACGACACATTAGTCTATTGGGCTCT 

AGTTAGCAGGTTTTCTTAAACTTCTCCCCAAGTTTATCCTGCCGAC^^ 

GACAGAGAATTTCTCTGGCACCATAGCTGGGCGACCTTGAGAGTGGCTTTACTGGGGCTC 

TTGCCTGTCTTATGGCTGTGCTCATGGATAGTTTGACTTGGCTCAATGACCTGATTTACCCTATGATGGTGGTCATT 
CAGACCATTCCGACCATTGCCATAGCTCCTATCCTGGTCTTGTGGCT AGGTT ATGGGATTTTGCCCAAGATTGTCTT 
GATTATCTTAACGACAACCTTTCCCATCATCGTTAGTATTTTGGACGGTTTTAGGCATTGCGACAAGGA 
ACCTTGTTTAGTCTGATGCGGGCCAAGCCTTGGCAAATCCTGTGGCATTTTA 

TTTATGCAGG T CTGAGGGTCAGTGTCTCCTACGCCTITATCACAACTGTGGTATCTGAGTGGTTGGGAGGTTTTGA 

AGGTCTTGGTGTTTATATGATTCAGTCTAAAAAACTGTTTCAGTATGATACCATGTTTGCCATTATTATTCTGGT 

CGATTATCAGTCTTTTGGGTATGAAGCTGGTCGATATCAGTGAAAAATATGTGATTAAATGGAAACGTTCGTAG 

MMRNLRSELRRHISIXGFLGVI^W 
AVLMDSLTWLNDLnTMMV^^ 

WQILWHFXIPVSLPYFYAGLRVSVSYAFITTVVSEWLGGFEGLGYYMIQSKKL 
YVIKWKRSZ 

ED42 372bp 

TTGATTTTTAATCCTATTTGCTGTATGATAAGGGAAAAGAAAGGGGACAGAGAT^ 
TGCGATCTGCTAGTTTTGGTATTGTTACCAGCTTGCCTG 

TTCTTAAAAAATGTCTTTGAATTGGAAGAAGAACTCGAGTTTCAATTGCTTAATAACCAAG 

ACTTTTCAAGTCAACACCTCCCTACAGCCATTGATTTTGACTTTAACCATCCITTCGACCCTCGT^ 

GTACTGGTTTTAGACATGGACGGTAGAGAAACTATCCTCCTCCCAGAAGAAAATGACCTATTTTAA 

MIFNPICCMIREKKGDRDMAFTNTHMRSASFGIVTSLPDDIIDSFWYIIDHFLK^ 
HLPTAIDFDFNHPFDPRYPPRVLVLDMDGRETILLPEENDLFZ 

ID43 1569bp 

ACAGCGGTCTCATTCTATCTATTTTAAGAAAAGT^ 

ATGAAATATTTTGTTCCTAATGAGGTATTCAGTATTGGTAAATTAAAGGTGGGGA 

TTTC AATTTTGGG AAGCC AAGGTATTTTATC GG ATGAAGTTGTTACTAGTTCTTC ACC G A'TOGCTAC AAAAG AGTC 
" ATTG AAGG ACT AATGGTTTAGAT^ 



* AGitoAe agaaitatg^ 



* -ATTGATGATGATCGAGAAC.^ - - m „ A A „ A ~ A A A ~ 

'TTK?CTCAAGGAACTTATTTAGTAAAAG AAATTGTTTTTTTAAAAA 

CTAGAATTCTAAATGGTATAAATA^ 
■;CH:GCAvVGTAGAAtGGGCK;CCAACAGAA-GATATTAGTrATTCTGGTGGTACGATO^ 
'•GAAGAAGGAACTAAAGCAAAAAATCTACCACTTATAA.\TTCTTCAGGTGCATTTGCTATTGGGAATO 

"'.GTAA'CTATAAAAAATGTAACAtTCA^GG 
^ TAGTTGATAATTGTCGTTTTC^GGGCAAG^ 
.^GeA^AG^ 
l/GACTA-TTCAAAAraCGTAT^ 

1 f ACX*TTGTCG AG AG AG AAGGGC 



••AA^TA^ACA" 



VWg&^^^ 

" accaaatattgaa™ 



:£:MTGLrcD^ 



GSKNVLVDNSl^LGQALPKTMKDGQHSKESIQIEPL^^ . 
QTLSTQNPSNIKIQNNHFDNMMYAGVRFTGFTDVLIKGNRFDKKVKGESVHYRESGAALVNAYSYKNTKDLLDLNKQ 

VVIAENIFNIADPKTKAIRVAKDSAECLGKVSDITVTKNVINNNSKETEQPNIELLRVSDNLVVSENS 
H>44 324bp 

GTGATGAAAGAAACTCAGCTATTAAAAGGTGTTCTTGAAGGTTGTGT^^^ 

TCATTAATGAAAGAAGGAGAAGAGCGTGTCTCAGTCTTTTGGCAACAATGGGACGATTTGAGTCAAAA 
GGGATTAAGAATGGGGGTTAA 

MMKETQLLXGVLEGCVLDMIGQKERYGYE 
MKEGEERVSVFWQQWDDLSQKYEGIKNGGZ 

ID45 816bp 

ATGAAGAAAATGAAGTATTACGAAGAAACAAGCGCTTTGCTACATGAGTTI^CTGAG 

GAGGAGTTGTGGGAAAGTTTTAATCTTGCTGGATTTCTCTATGATGAAGACTATCTCAGAGAGCAGATCTATTTGA 
TGA^GCTAG^TrTCTCAGAAGCAGAACGAGATGGCATGAGTGCAGAGGATTATCT 

TAATGAAAGAGATTCTCAAGGGAGCACCTCGCAGTTCTATCAAAGAGTCCCTTTTGACGCCAATTCTTGTCCTGGC 

gg^I^cta^a^^ 

ngcaacttcttatttttctgattggatttggacttgtggccacaattttacgaagaagtttagtccaagattcrc 
Sa^tgaaa™^ 

G^St^^GGAGCCTI™^ 

GGTATTTGGAATrGGAAAGAAGCGGTCTTTCGTCCATTTGTCAGTATGATTATTGCCCATCTTGTGGTGjGOT 
GAATCTTTGTCTTGTTCCGTGGGTTTAAGAAGATAAAATGGAGTGAAGTATAG 

VFLTKVIPLAVLFIGIFVLFRGFKKIKWSEVZ 
ID46 348bp 

CTGTTTTTTTATTTATACTCAATGAAAATCAAAGAGCAAACTAGGAA 

^gaggVtg^cgaaactgacgaagtcagctcaaaacatgtt^ 

GCTCAAAACACTGTITTGAGGTTGTAGATGAAACTGACGAAGTCAGCTCAAAACACTGTrrTGAGGTTGTAG 
aISgI^AAGTCA^^^^ 

TAGGGCGACGCTGAGGTGGTTTGAAGAGATrTTCGAAGAGTATTAA 

MFFYLYSMKIKEQTRKLAAGCSKHCFEVVDETDEVSSKHVFEWDETDEVSSKHCFEVV " 
EVSSKHVFEWDETDEVSNHTYGRATLTWFEEIFEEYZ 

ID47 1260bp 

ATGCAGAATCTGAAATirGCCTTTrCATCTATCATGGCTCACAAGATGCGTTCTTTGCTrACTATGATTGGGAT^T 

TATC^CTCTTTCATCAGTTGTTGTGATiATGG^ 
AATCTCAGAAAAATATTAGCGTCTTTTTCTCTCCTAAAAAAAGTAA 

TITrACGGTTTCTGGAAAGGAAGAGGAAGTrCCTGTTGAACCGCCAAAACCGCAAGAATCCTGGGTCCAAGA 

ACSCTAAACTGAAGGGAGTGGATAGTTACTATGTAACCAATTCAACGAA^ . 
GGTTGAGAATGCTAAWrGACAGGTGGAAACAGAACTTACATGGACGCTGTr^G^ 
TAGTCIX3AGAGAGCAAGATTTCAAAGAGTTTGCAAGTGTGATTTTGCTA 
rt\AT<^eeTCAAGAGGDTATTAJVCAAGUTrGTAGA^GTC 

G^fe^AAAWfeAAA^ . 
TnTAATG^AGATGAAATAGCTAATATTGT^^ 

CTGGCACGAAAVy^TGACAGAGCTTGCAGG^T^AeAACAGGGAGAATACCAGGTG 
GCA^AA^^TCAA^A^TCGTTTAGTTTTATGACGACGATTATTAGT^ 

GAAGTGGTGTeATGA,\GATCATGCTGGTTTGGGXGACAGAGCX5CACTCGTGAGATTGGTCTrCGTAAGGC^QG0. 

TGACAATTGCAAGTGGTITAACTOCCTTAGCAGGTTTGmCTGCAAGeraAATAGAAGG^ 
ATCXaTCCXZAGTCGCCCTA™ 

AGGCATCGAAACTTGATCCAATTGAAGCCCTTCGTTATGAATGA - — . . . - 



43 



MQNLKFAFSSIMAHKMRSLLTMIGUIGVSSVW 

GKEEEVPVEPPKPQESWVQEAAKLKGVDSYYYTNSTNAILTYQDKKVENAN 

KEFASVILLDEEl^ISLFESPQEAINKVVEVNGFSYRVIGVYTSPEAKRSKIYGFGGLPITTNISLAANFNVDEIANIVFRVN 
5 DTSLTPTLGPELARKMTELAGLQQGEYQVADESVY 

LRKALGATRANILIQFLIESMILTLLGGLIGLTIASGLTALAGLLLQGLIEGIEVGVSIPVALFSLAV 

ASKLDPIEALRYEZ 

• • , , ..*•'■'-■"*■'".■'""'' 

ID48 705bp 

10 

CTGATGAAGCAACTAATTAGTCTAAAAAATATCTTCAGAAGTTACCGTAATGGTGACCAAGAACTGCAGGTTCTCA 
AAAATATCAATCTAGAAGTGAATGAGGGTGAATTTGTAGCCATCATGGGACCATCTGGGTCTGGTAAGTCCACTCT 
GATGAATACGATTGGCATGTTGGATACACCAACCAGTGGAGAATATTATCTTGAAGGTCAAGAAGTGGCTGGGCT 
TGGTGAAAAACAACTAGCTAAGGTCCGTAACCAACAA^ 
15 CTCAATGCTCTGCAAAATGTAGAATTGCCCTTGATTTACGKZAGGAGTTTCGTCTTCAAAACGTCGCAAG 

AGGAATATTTAGACAAGGTTGAATTGACAGAACGTAGTCACCATrrACCTTCAGAATTATCTGGTGGTCAAAAGCA 
ACGTGTAGCCATTGCGCGTGCCTTGGTAAACAATCCTTCTATTATCCTAGCGGATGAACCGACAGGAGCCTTGGAT 
ACCAAAACAGGTAACCAAATTATGCAATTATTGGTTGATTTGAATAAAGAAGGAAAAACCATTATCATGG 
CATGAGCCTGAGATTGCTGCCTATGCCAAACGTCAGATTGTCATTCGGGATGGGGTCATTTCGTCTGACAGTGCTC 

20 AGTTAGGAAAGGAGGAAAACTAA 

MMKQLISLKNIFRSYRNGDQELQVLKNINLEVNEGEFVAIMGPS 
QLAKVRNQQIGFVFQQFFLLSKLNALQW^ 

LVNNPSIIl^EPTGALDTKTGNQIMQLLVDLNKEGKTIIMVTHEPEIA^ 

25 

ID49 1200bp 

ATGAAGAAAAAGAATGGTAAAGCTAAAAAGTGGCAACTGTATGCAGCAATCGGTGCTGCGAGTGTAGTTGTATTG 
GGTGCTGGGGGGATTTTACTCTTTAGACAACCTTCTCAGACTGCTCTAAAAGATGAGC 

30 CCAAGGAAGGAAGCGTGGCCTCCTCTGTTTTATTGTCAGGGACAGTAACAGCAAAAAATGAACAATATGTTTATTT 
TGATGCTAGTAAGGGTGATTTAGATGAAATCCTTGTTTCTGTGGGCGATAAGGTCAGCGAAGGGCAGGCTTTAGTC 
AAGTACAGTAGTTCAGAAGCGCAGGCGGCCTATGATTCAGCTAGTCGAGCAGTAGCTAGGGCAGATCGTCATATC 
AATGAACTCAATCAAGCACGAAATGAAGCCGCTTCAGCTCCGGCTCCACAGTTACCAGCGCCAGTAGGAGGAGAA 
GATGCAACGGTGCAAAGCCCAACTCCAGTGGCTGGAAATTCTGTTGCTTCTATTGACGCTCAATTGGGTGATGCCC- 

3 5 GTGATGCGCGTGCAGATGCTGCGGCGCAATTAAGCAAGGCTCAAAGTCAATTGGATGCAACAACTGTTCTCAGTA 
CCCTAGAGGGAACTGTGGTCGAAGTCAATAGCAATGTTTCTAAATCTCCAACAGGGGCGAGTCAAGTTATGGTTC 
ATATTGTCAGCAATGAAAATTrACAAGTCAAGGGAGAATTGTCTGAGTACAATCTAGCCAACCTTTCTGTAGGTCA 
AG AA GTAAGCTTTACTTCTAAAGTGTATCCTG ATAAAAAATGG ACTGGG AAATT AAGCT ATATTTCTG ACTATC GT 
AAAAACAATGGTGAAGCAGCTAGTCCAGCAGCCGGGAATAATACAGGTTCTAAATACCCTTATACT 

40 ACAGGCGAGGTTGGTGATTTGAAACAAGGTTTTTCTGTCAACATTGAGGTTAAAAGCAA 

^, ^ ^^^ClGTTAGCA^TCTAGTAATGGATGAg^GTAAS^ * , 

* -v;'' — vaAGTTGAGGTTTCACT^ \ - ^ <rfU.*U«v f 

: ; ~. : ■ : .v ; .; . : tcatc^ ' • w^wr, 
- . . .. • ' -45 ? - ^KjjockNGKAKKW : : ~- r 

•ASKGbLbEILVSVGDKVSEGQALVKYSSSEAQAAYDSASRAVARA^ V. 

>■; : qsptpvagnsvasidaqlgdarda^adaaaqlskaqsqldattvi^tlegtw . • : >-;; 

: - " * \* LQVKGELSEYm^IlSVGQEVSCT^ 7 rrr"-:.; — T! 

■ : ;, GFsv^nEVKSiK'FKAiLYPYssLyM - - - ; 

7. .5.0L..' YKAfijEATNZj.:.: *..I;';:V-s^".- - v^v:v ... . . ~ L. 

» * ■ 7-vy ;7j^^ * / ' ^ 

... , ~~.-™- A r rG ^^^ - 

v r • ' "~ ^"T^^ggtga^ ^ 

* • - .-r~*r*-^ ■*? axTi^^^^ v: : , S* 

tgttgtggtgaatcacttgaa^ : 

^y^'T::;^:.,;^^ ATGAGGCTCA^ ; j 

-r ^ ' .^^^^^ 
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MSRKPFIAGNWKMNKNPEEAKAFVEAVASKLPSSDLVEAGIAAPALDL^ 

VGGASLEAESFLALLDFVKZ 



ID51 1473bp 

AAGGCAACTACTATTTGACTGGAAGTGGTGCCATGGCGACTGACGAAGTC 

=^=£^== 

cSgSIaA^A^S^ 

^S^A^C^^^ 

GGAATCCAAGGGCGCGTAGATGTCAGCGTTTGGTATTAA 

WENPHYSGKKGWQYTSSEYMKGIQGRVDVSVWYZ 
ID52 774bp 

a tc a a A A A atttGCCAACCTTTATCTGGGACTGGTCTTTCTGGTCCTCTACCT 
GGGAGACTCATGCTGATTTTGGCTCAGACATTTTTC^ 

GGCGACATGATTCATGCGGCCTATGACTTGGGAGCTAGTCAAT^CAGATGTTCAAGG^ 

TGACTCCGTCTATCATTACTGGTTATTTCATGGCCTTCACCTATTCGTTAGATGACTrTGCCGTGACC 1 1 c J_* ~^ ,11 
CAAGCATGA 



MKKFANLYLGLVFLVLYLPIFYLIGYAFNAGDDMNSFTGFSWTHFETMFGDGRLN^^ 



^^^^P^PPP^^pATTATTCAAAAATTGAAGuAAiC 

gacc"aggVaa C ?^ 
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10 



AAGCGCCTGAGCATTGGGATGACCTTTGGAAGCCGGAGTATAAGAATTCTATCATGCTCTTTGATGGGGCGCGTGA 

GGTGCTGGGACTAGGACTCAATTCCCTCGGCTACAGCCTCAACTCCAAGGATCTGCAGCAGTTGGAAGAGACAGT 

GGATAAGCTCTACAAACTGACTCCAAATATCAAGGCTATCGTTGCGGACGAGATGAAGGGCTATATGATTCAGAA 

TAATGTTGCAATCGGCGTGACCTTCTCTGGTGAAGCCAGCCAAATGTTAGAAAAAAATGAAAATCTACGTTATGTG 

GTACCGACAGAGGCCAGCAATCTTTGGTTTGACAATATGGTCATTCCCAAAACAGTTAAAAACCAAAACTCAGCC 

TATGCCTTTATCAACTTTATGTTGAAACCTGAAAATGCTCTCCAAAATGCGGAGTATGTCGGCTATTCAACACCAA 

ACCTACCAGCGAAGGAATTGCTCCCAGAGGAAACAAAGGAAGATAAGGCCTTCTATCCCGATGTTGAAACCATGA 

AACACCTAGAAGTTTATGAGAAATTTGACCATAAATGGACAGGGAAATATAGCGACC^ 

TGTATC G G AAGT AG 



MKKIYSFLAGIAAIILVLWGIATHIJDSKINSRD^ 
TTYDIAIPSEYMINKMKDEDLLVPLDYSKIEGffiMGPEFLN^^ 

kpeyk^simlfdgarevlglglnslgyslnskdlqqleetv^ 

QMLEKNENLRYVVPTEASNLWFDNMVIPKTVKNQNSAYAFI^ 
1 5 KAJFYPDVETMKHLEVYEKFDHKWTGKYSDLFLQFKaMYRXZ 

1D61 1851bp 

ATGAATAAAAAACTAACAGATTATGTGATTGATCTGGTGGAAATTTTA AATAAACAA CAAAAGCAG^ 
20 GGAATATTTGATATTTrCAGTATGGTGGTTTCCATCATTGTATCTTATATTTTATT^ 

ACCTGTTGACTACATTATCTATACGAGTTTGGCCTTCCTGTTCTATCAATTGATGATTGGTTITT 
CGAGCATTAGTCGTTACAGCAAGATTACGGATTTCATGAAAATCTTTTTTGGTGTGACTGCTAGCAGTGTCTTGTC 

ATATAGTATCTGTTATGCCTTCTTGCCACTCTTCTCCATCCGTTTC 

GATTTTATTGCCACGGATTACTTGGCAGTTAATCTACTCCAGACGCAAuAAAAGGTAGTGGTGATGGAGAACACCGT 
25 CGGACCTTCTTGATTGGTGCCGGTGATGGTGGGGCTCTTTTTATGGATAGT^ 
AACTGGTCGGTATTTTGGATAAGGATTCTAAGAAAAAGGGTCAAAAACTTG 

TGACAATCTGCCTGAATTAGCCAAACGCCATCAAATCGAGCGTGTCATCGTTGCGATTCCGTCGCTGGATCCGTCA 
GAATATGAGCGTATCTTGCAGATGTGTAATAAGCTGGGTGTCAAATGTTACAAG ATGC CTAAGGTTGAAACTGTTG 
TTCAGGGCCTTCACCAAGCAGGTACTGGCTTCCAAAAAATTGATATTACGGACCTTTTGGGTCGTCAGGAAATC 
3 0 TCTTGACGAATCGCGTCTGGGTGC AGAACTGAC AGGTAAGACCATCTTAGTC AC AGGAGCTGGAGGTTC AATCGG 

TTCTGAAATCTGTCGTCAAGTTAGTCGC^ 

TACCTTGTTTATCATGAATTGATTCGTAAGTTCCAAGGGATTGATTATGTACCTGTGATTGCGGACATTCAAGA 
ATGATCGTTTGTTGCAAGTCTTTGAGCAGTACAuAACCTGCTATTGTTTATCATGCGGCAGCCCACAAGCATGTTCC 

TAtGATGGAGCGCAATCCAAAA'GAAGCCTTCAAAA^ 
35 TGAAGCTAAAGTGTCTAAGATGGTTATGATTTCGACAGATAAGGCAGTCAATCCACCAAATGTTATGGGAGC 

CAAGCGCGTGGCGGAGTTGATTGTGACTGGCTTTAACCAACGTAGCCAATCAACCTACTGTGCAGTTCGTTTTGGG 

aatgttcttggtagccgtggtagtgtcattccagtctttgaacgtcagattgctgaaggtgggcctgtaacggtga 

GAGACTTCCGTATGACGCGTTACTTTATGACCATTCCAGAAGCTAGCCGTCTGGTTATCCATGCTGGTGCTTATGC 
CAAAGATGGGGAAGTCTTTATCCTTGA 
4(L__^TaAGTqGO!AC^ 
^ ., • CTCTTGGTATC^^ 



' * ' dcTAATCAAAC ' \ : - " : " ' : *, 7 -. ' \ • . 

"45 ; - 

^ ^SKITDFMKIFFGV^ 

; ' MbSYQHP^ ' ' 

— V "' * ! MPKVETVVQGLHQAGTGFQ " 
*• : - "isr^YHELI^ * 
-50 --^ - -XkVSKMV-MISTD^ - 

; - : - mtryfWtipeasrlvihagayak^ d 



■ ; *. '"\--'DNQ . _. , . ^ r _ _ x 




CTTGTATTGAAGATGTGACGCAGGAGAGGGCTGTCATTCA^ 
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AATATGCAGTTGATTTTGGAAAGTGATAATGTGCGTACTAAGAAGATCATCATTCCAAATAAGGCGACTTATGAGC 

GCGCTTTAGAGTTAACTGACGAGAAATACCATGATCAGTTTGTGCACTTGGGTTATCATTACCAGTTCAAACGTGA 

TAATTTCCTAAGACGAGATGCCTTAATCTTGACCAATTCAGATCAGATTGAGCAAGTAGAAGCAATCGCAGGAGC 

CTTGCCTGATGTCACTTTCCGTATTGCAGCGGTGACAGAGATGTCTTCTAAGCTCTTAGACATGCTTTGCTATCCTA 

ATGTGGCCCTTTACCAGAACGCTAGTCCACAGAAGATTCAGGAGCTGTATCAACTGTCGGATATTTACTTGGATAT 

AAACCACAGTAATGAGTTGCTACAGGCAGTGCGTCAGGCCTTTGAGCACAATCTClTGATTCTTGGCTuTAATCAU 

ACGGTGCACAATAGACTTTATATCGCTCCAGACCATCTATTTGAAAGTAGTGAAGTTGCTGCTTTGGTTGAGACCA 

TTA^TTGC<:CCTTTCAGATGTTGA^ 

CTTGGTGAGATATCAGGAAACCATGCAAACTGTTTTAGGAGGCTAA 



MlELYDSYSQESRDLHESLVATGI^QLGWmADGFLPDGLLSPFTYYLGYEDGKJPLYFNQWVSDFWEILGDNQSACIE 
DVTQERAVMYADGMQARLVKQVDWKDLEGRVRQVDH^ 
GDILLTLPGQSMRYFANKVEFnTFLQDLEIDTSQ 

NVRTKK1IIPNKATYERALELTDEKYHDQFVHLGYHYQFKRDNFLRRDALILTNSDQIEQVEAIAGALPDVTFRIAAVTE 

15 MSSKUJDMLCmrVALYQNASPQKIQELYQl^DIYLDINHSNElX 

EVAALVETIKLALSDVDQMRQALGKQGQHANYVDLVRYQETMQTVLGGZ 

ID102 1512bp 

20 ATGACAATTTACAATATAAATTTAGGAATTGGTTGGGCTAGTAGCGGTGTTGAATACGCTCAAGCCTATCGTG^ 
GTGTTTTTCGGAAATTAAATCTGTCCTCTAAGTTTATCTTTACAGATATGA 

ACAGCCAATATTGGTTTTGATGATAATCAGGTTATCTGGCTTTATAATCATTTCACAGATATCAAAATTGCAC 
CTAGCGTGACAGTGGATGATGTCTTGGCTTACTTTGGTGGTGAAGAAAGTCACAGAGAAAAAAATGGCAAGGTTT 

TACGTGTATTCTTTTTTGACCAAGATAAGTTTGTAA 
25 GCCGAGTATGTrTTTAAGGGAAACCTGATTCGGAAGGATTACTTTTCTTATACGCGTTATTGT 
TCCCAAGGACAATGTTGCAGTCTTATACCAACGAACTTTTTATAATGAAGACGGGACTCC 
ATGAATCAAGGGAAGGAAGAAGTTTATCATTTCAAGGATAAGATTITCTAT 
TTATGAAATCTTTGAATTTGAATAAGTCTGATTTGGTCA 

TGAGGAAGCACAGACAGCACATCTAGCGGTAGTTGTTCATGCGGAGCATTATAGTGAAAATGCTACAAATGAGGA 
30 CTATATCCTTTGGAATAACTATTATGACTATCAGTTTACCAATGCAGATAAGGTrGACTTCTTTATCGTGTCTACTG 
ATAGACAAAATGAAGTTCTACAAGAGCAATTTGCCAAATATACTCAGCATCAGCCAAAGATTGTTACCATTCCTGT 
AGGCAGTATTGATTCCTTGACAGATTCAAGTCAAGGGCGCAAACCATTTTCATTGATTACGGCTTCACGTC 
AAAGAAAAGCACATTGATTGGCTTGTGAAAGCTGTGATTGAAGCTCATAAGGAGTTACCGGAACTAACCTTTGAT 
ATCTATGGTAGTGGTGGAGAAGATTCTCTGCTTAGAGAAATTATTGCAAATCATCAGGCAGAGGACTATATCCAAC 
35 TCAAGGGGCATGCGGAACTTTCGCAGATTTATAGCCAGTATGAGGTCTACTTAACGGCTTCTACCAGCGAAGGATT 
TGGTCTGACCTTGATGGAAGCTATTGGTTCAGGTCTACCTCTAATTGGTTTTGATGTGCCTTATGGTAATCAGACCT 
TTATAGAGGATGGGCAAAATGGTTATTTGATTCCAAGTTCATCTGACCATGTAGAAGACCAAATCAAGCAAGCTTA 
TGCCGCTAAGATTTGTCAATTGTATCAAGAAAATCGTTTGGAAGCTATGCGTGCCTATTCTTACCAAATT 
GGCTTCTTGACCAAAGAAATTTTAGAAAAGTGGAAGAAAACAGTAGAGGAGGTGCTCCATGATTGA 

40 

~~ MTIYN^GIGWASSGVE^AQAYRAGVFRKLNLSSKFIFT^ 

■-* Vddvi^yfggeeshrekngkvlrvfffdqdki^tcylvdenkdlvqhaeyvfkgnlirkd^ 
• vavlyqrtfynedgtpvydilmnqgkeevyhfk^^ 

HLAWVHAEHYSENATNEDYIL^ 
45 SQGRKPFSLITASRI^CEKHmWLVKAVIEAHKELPELTFDIYGS 

VYLTASTSEGFGETIJtfEAIGSGLPLIGFDVPYGNQTFIEDGQNGYL 
YSYQIAEGFLTKEILEKWKKTVEEVLHDZ 

50 



ID103 2292bp \ - • . 

.55 ATGTrrTCTCTrrcGGATc^ 

TAGACGATATTTTGGTTGAAGC^ 
'"■ TGrrCAAGTCATGGGAGCTATTGlCATGCACTATGGAAATGTTGCTGAGATGAATACGGGGGAAGGTAAGACC 

* GACAGCTACCATGCCTGTC^ 

TCAAAGCGTGATGCCGAGGAAATGGGTCAAGTTTATCG^TTTTCTAGGATTGACCATTGGTGTACCATTTACGGAAG 

. 60 ATCCAAAG-AAGGAGATGAAAGCTOAAGAAAAGAAGCTTATCT^ 

ATiTAGGTTTTGATTATCTAAATGATAACCTAGCCTCGAATC 
- " GATTATTGATGAAATTGATGATATCTT£C 

^ i~ ■ AGTrTAATl'ACTATGCGATCATTbATACACTTGTAACAA 

■. GAAAGAGGAGGTTTGGGTCACTAGTAAGGGGGCCAAGTCTGCTGAGAATTTCCTAGGGATTGATAATTT • 
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GGAAGA(K:ATGCGTCTTTrGCTC - - 

tatatcattcgtggaaatgagatggtactggttgataagggaacagggcgtctaatggaaatgactaaacttcaa 
ggaggtctccatcaggctattgaagccaaggaacatgtcaaattatctcctgagacgcgggctatggcctcgatc 
acctatcagagtctttttaagatgtttaataagatatctggtatgacagggacaggtaaggtcgcggaaa^ 
5 tttattgaaacttacaatatgtctgtagtacgcattccaaccaatcgtccgagacaacggattgactat^^ 

atctatatatcactttacctgaaaaagtgtatgcatccttggagtacatcaagcaataccatgc i AAGuuAAATCC 
tttactcgtttttgtaggctcagttgaaatgtctcaactctattcgtctctct^ 

atgtcctaj^tgctaataatgcggcgot - - 

tggctacctctatggcaggacgtggtacggatatcaagcttggtaaaggagtcgcagagcttgggggcttgatt^ 

10 ttattgggactgagcggatggaaagtcagcggatcgacctacaaattcgtggccgttctggtcgtcagggagatc 
ctggtatgagtaaattttttgtatccttagaggatgatgttatcaagaaatttggtccatcttgggtgcataaa 
gtacaaagactatcaggttcaagatatgactcaaccggaagtattgaaaggtcgtaaataccggaaactagtcga 
aaaggctcagcatgccagtgatagtgctggacgttcagcacgtcgtcagactctggagtatgctgaaagtatgaa 
tatacaacgggatatagtctataaagagagaaatcgtctaatagatggttctcgtgacttagaggatgttgttgtg 

15 gatatcattgagagatatacagaagaggtagcggctgatcactatgctagtcgtgaattattgtttcactttattg 
tgaccaatattagttttcatgttaaagaggttccagattatatagatgtaactgacaaaactgcagttcgtagct^ 
tatgaagcaggtgattgataaagaactttctgaaaagaaagaattacttaatcaacatgacttatatc 
ttacgactttcactgcttaaagccattgatgacaactgggtagagcaggtagactatctacaacagctatccatgg 
ctatcggtggtcaatctgctagtcagaaaaatccaatcgtagagtactatcaagaagcctacgcgggctttgaag 

20 ctatgaaagaacagattcatgcggatatggtgcgtaatctcctgatggggctggttgaggtcactccaaaaggtg 
aaatcgtgactcattttccataa 

mssi^dqelvaktvefrqrlsegeslddilveapavvread 
atmpvyij^apsgegvmvvtpney1^kiu)aeemgqvyrflgltigvpftedpkxem 

25 YLNDNLASNEEGKPLRPFNYVnDEroDILLDSAQTPLIIAGSPRVQSNYYAITOTLVTTLVEGE 
AKSAENFLGIDNLYKEEHASFARHLVYAIRAHKLFTKDKD^ 

KLSPETRAMASITYQSLFKMFNKISGMTGTGKVAEKEFIETYNMSVVRIPTNRPRQRIDYPDNLY 

YHAKGNPLLVFVGSVEMSQLYSSLLFREGlAHNVLNANNAAREAQnSESGQMGAVTVATSMAGRGTDIKXGKGVAEL 
GGLIVIGTERMESQRIDLQIRGRSGRQGDPGMSKPFVSLEDDVIKKFGPSWVHKKYTCDYQVQDM 
30 VEKAQHASDSAGRSARRQTLEYAESMNIQRDr/YKERNRLIDGSRDLEDVVVDIIERYTEEVAADHYASRELLFHFIVTN 

ISFHVKEVPDYIDVTDKTAVRSFMKQVIDKEI^E 

ASQKNPIVEYYQEAYAGFEAMKEQIHADMVRNLLMGLVEVTPKGEIVTHFPZ 
IP104 879bp • ' 

35 

ATGAAACAAGAATGGTTTGAAAGTAATGATTTTGTAAAAACAACAAGCAAGAACAAGCCTGAAGAGC 

GAGGTTGCAGACAAGGCTGAAGAAAGGATACCCGATCTCGATACACCAATTGAAAAAAATACTCAGTTAGAGGAG 

GAAGTCTCTCAAGCTGAAGTCGAATTGGAAAGCCAGCAAGAAGAGAAAATTGAAGCTCCTGAAGACAGTGAAGC 

- - : •' GAGAXCAGAAATAGAAGA^^ /- : 7 

40 'aagtcactatagct ' _ 

- ~- - "ctaXatcttt^ ' ^ 

v, -~ * - ' . i ATOTOFGGf CTTGGGT^ - : T«-fU ^ * ^ • 

- ~'':V . ^A'GCCTTT . 

. ■»;.".'-..' . . AGATATAGeAAGCATTAAGAGTCGCTTCGCTGAGGAGGTAGC -f ' <*' 

■ : 45-- ' -tagcgac^Xg^ * 

r * ■ 'v: r ' / ctggacgctagagaaggtttctccaAcaAt^ > - ■ 

-.-v * - tc^ " ••-^ : ;:-.:-^>x'-" ^ • y* 

y::. . f .v.; : *\ mkqewfes^ - ^ ' . - *' ; : '■. 

- ' EKKASNSTEEfePDI^KETC^ J" _ r 

mi. . • - • - PTSK£ETsiTHSYTA , .": 

^ - . * ji)I06^32^p ^^^ *r J r~ *-*^-~~ ^ '.V^xf^ -^sv^r-^.^-: 4^—*** -^--v.-h-- — ■ . ^ ^-*->*-*- >- T .-*- : - ^^'vn^ >-;^: fr?r '^.^r ^'v- 

* ~ ' GTATAGAGCAGCTCTTGA^ ' ' " ' 

i&tei- v ±60* ^'QT^GGGGGATGACA^OAAGTOA-^^-;--* ^ >«-• -v - v ^$^^^,^^ ? 5^ s a f I• - ^■►■■^•w^.;r^*-e-^^^-'e^^ 

^ ^-^::r* : ^^;^i^^y^ssa^ 
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ID108 954bp 

ATGGATTTTGAAAAAATTGAACAAGCTTATATCTATTTACTAGAGAATGTCCAAGTCAT^ 

CCAACTTTTATGACGCCTTGGTGGAGCAAAATAGCATCTATCTGGATGGTGAAACTGAGCTAAACCAGGTCAAAG 
5 ACAACAATCAGGCCCTTAAGCGTTTAGCACTACGCAAAGAAGAATGGCTCAAGACCTAC^AGTTTCTCTTGATG^ 

AGGCTGGGCAAACAGAACCCTTGCAGGCCAATCACCAGTTm^ 

TGTGGAAGAGTTGTTTAAAGAGGAGGAAATTACTATCCTCGAAATGGGTTCTGGGATGGGAATTCTAGGCGCTATT 

TTCTTGACCTCGCTTACTAAAAAGGTGOA_TTACTTGGGAATGGAAOTGGATGA.TTTGCTGATT 

GCATGC<:AGATGTAATTGGTTTGCAGGCTGGCTTTGTCCA-AGGAGATGCCGTTCGCCCACAAATGCTCAAA 

10 GCGATGTGGTCATCAGTGACTTGCCTGTCGGCTATTATCCTGATGATGCCGTTGCGTCGCGCCATCAAGTTGCTTC 
TAGCCAAGAACATACTTACGCCCATCACTTGCTCATGGAACAAGGGCTTAAGTACCTCAAGTCAGACGGATACGC 
TATTTTTCTAGCTCCGAGTGATTTGTTGACCAGTCCTCAAAGTGATTTGTTAAAAGAATGGCTGA 
AGTCTGGTTGCTATGATTAGTCTGCCTGAAAATCTCTTTGCT^ 
GAAGAAAAATGAAATAGCAGTAGAGCCTTTTGTTTATCCACTTGCTAGCT^ 

1 5 TTTAAAGAAAATTTTCAAAAATGGACTCAAGGTACTGAAATATAA 

MDFEKIEQAYIYLLENVQVIQSDLATNF^VLVEQNSIYLDGETEL^ 
QTEPLQANHQFTPDAIALLLVFWEELFKEEErriLEMGSGMGILGAIFLTSLTKK^DYLG 

QAGFVQGDAVRPQMLKESDVVISDLPVGYYPDDAVASRHQVASSQEHTYAHHLLMEQGLKYLKSDGYAIFLAPSDLLT 
20 SPQSDLLKEWLKEEASLVAMISLPENLFANAKQSKT^ 
Z 

ID110 1902bp 

2 5 ATGATTATTTTACAAGCTAATAAAATTGAACGTTCTT 

TTGATGAACGAGATCGGATTGGTCTTGTTGGGAAAAATGGTGCAGGTAAGTCTACTCTTTTGAAGATITTAGTTGG 
AGAAGAGGAGCCAACTAGCGGAGAAATCAATAAGAAAAAAGATATTTCTCTGTCTTACCTAGCCCAAGATAGCCG 
TTTTGAGTCTGAAAATACCATCTACGATGAAATGCTTCATGTCTTTAATGATTTGCGTCGGACGGAGAGACAACTG 
CGTCAGATGGAGCTGGAGATGGGTGAAAAGTCTGGTGAGGATTTGGATAAACTG ATGTC AGATTATGACCGCTTA 

30 TCTGAGAATTTTCGCCAAGCAGGTGGCTTTACCTATGAAGCTGATA 

ACGAGTCTATGTGGCAGATGAAAATTGCTGAGCTTTCTGGTGGTCAAAATACTCGTTTGGCACTTGCCAAAATGCT 
CCTTGAAAAGCCCAATCTCTTGGTCTTGGACGAGCCAACTAACCACTTGGATATTGAAACCATCGCCTGGCTAGAG 
AATTACTTGGTAAACTATAGCGGTGCCCTCATTATCGTCAGCCACGACCGTTATTTCTTGGACAAGGTTGCGACAA 
TTACGCTAGATTTGACCAAGCATTCCTTGGATCGCTATGTGGGGAATTACTCTCGT^ 

35 AAAGCTAGTTACTGAGGCAAAAAACTATG^\AAAGCAACAGAAGGAAATCGCTGCTCTGOAAGACTTTGTCAATCG 
CAATCTAGTTCGTGCTTCAACGACTAAACGTGCTCAATCTCGCCGTAAACAACTAGAAAAAATGGAGC GTTTG GA 
CAAGCCTGAAGCTGGCAAGAAAGCAGCCAACATGACCTTCCAGTCTGAAAAAACGTCGGGCAATGTTGTTTTGAC 
TGTTGAAAATGCAGCTGTTGGCTATGACGGGGAAGTCTTGTCACAACCT^ 

GCTGTCGCTATCGTTGGTCCAAATGGTATCGGCAAGTCAACCTTTATCAAGTCTATTGTGGACCAGATTCCTTTTAT 

40 CAAGGGAGAAAAGCGCTTTGGCGCTAATGTTGAGGTTGGTTACTATGAC 

TaAT ACG^i TGC i GG A i oaAC TC rGGAATGATITC AAACTGACACC AGAAGT f G AAATCCGCAACC&lCTTGC^C 
CTTCCTTTTCTCAGGAGATGATGTTAAAAAATCAGTCGGCATGCTATCTGGTGGCGAAAAAGCTCGT^ 
GCTAAATTGTCTATGGAAAACAATAACTTTTTGATTGTGGATGAGCCGACCAACCACTTGGATATTGATAGTAAGG 
AAGTGCTAGAAAATGCCTTGATTGACTTTGATGGAACCTTGCTGTTTGTCAGTCATGATCGTTACT^ 

4 5 GTGGCAACTC ATGTTTTGGAATTGTCTGAGAATGGTTC AACTCTCTACCTTGGAGATTACGACTACTATGTTGAGA 

AGAAAGCAACAGCAGAAATGAGTCAGACTGAGGAAGCTTCAACTAGCAATCAAGCAAAGGAAGCAAGTCCAGTC 
AATGACTATCAGGCCCAGAAAGAAAGTCAAAAAGAAGTTCGCAAACTCATGCGACAAATCGAAAGTCTAGAAGCT 
GAAATTGAAGAGCTAGAAAGTCAAAGCCAAGCCATTTCTGAACAAATGTTGGAAACAAACGATGCCGACAuAACTC 
ATGGAATTACAGGCTGAGCTGGACAAAATCAGCCATCGTCAGGAAGAAGCTATGCTTGAGTGGGAAGAATTATCA 

50 GAGCAGGTGTAA 

MIILQANKJERSFAGEVLFDNINLQVDERDRIALVGKJ^GAGKSTLLKILVGEEEPTSGE 
IYDEMLHVFNDLRRTERQLRQMELEMGEKSGEDIJDKL^ 
IAELSGGQNTRLALAKMLIJsKPN^ 
55 YVGNYSRFVELKEQKLVTEAKNYEKOQKEIAAL^^ 
. ' QSEKTSGNVVLTVENAAVGYBGEVLS^ 
" ' TQSKLTPSNTVLOELWNDFKLTPFVBIRh^ 
■ iDSKEVLENALIDFDGTLLFVSHDRYFINRVATHVLE 

VNDYQAQKESQKEVRKXMRQIESLEAEIEELESQSQAISEQMLETNDADKLMELQAELDKISHRQEEAMLEWEEl^EQ^ 

60 Z ... , . - . . 

iDlll 1179bp 
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======= 



LPMGGGIEVTV AQHTNGYQTLANNGV YHQKHVIJ 



SKIEAADGRVVYEYQDKPVQVYSKATATIMQGLLREVLSSRV 



TTT 



GWIGHDDNHSLSRRAGYSNNSNYMAHLVNAIQQAS 



GSLPTPSSSSSSSSSSSDSSNSSTTRPSSSRARR2 
mi!4 1974bp 

ACTGAGAGTGCATTATACAAGGAGTGATG^G^ 
AGCTCTATTGGAACAGATGCCTGTAGGTGTTATGAAA^ 
TATGCTGAATTGATrTTGACCAAGGAAGATGGTGATmGA^ 
TAGGAAATCCGTCTACTTATGCCAAGCTTGGTGAGAAGCGTTATGCTGTTC 

TATTTTGTAGATGTATCCAGGGAACAAGCCATAACAGATG^ 
CTGTGGATAATTATGATGATTTGGAGGATGAAACT^^ 
TTTTATATCAGAGTTTrCAGAAAAACACATGATGT^^ 
ACTACACGGTGCTTGAGGGCTTGATGAATGATAAATITTCTG^ 

AGAGGCTAAGGACATGAAATGCTATGATACAGTTGTTATTAGTAAGGCAGCA 
TATTGAAGCGAGTmGTrCTTGC^ 



MKKFYVSPIFPILVGUAFGV^^ 



FSVIPAPREESKQRQLPLTLSMGFSYGDGNHDEIG^^ 



tRTRAMMT/ 
LSVKDAMGJ* 
KNRLSRMQ^ 
AKDMKCYDTVV 
GEKLTEIVLNEMKEKEKEEZ 

IP115 663bp 

ATGAAGTGCTTGTTATGTGGGCAGACTATGAAGACTG^ 

ArtCTTGTCTTTGTTCAGACTGTGATTCTACnTTTGAAAGAATTGGGGAAGAGAAU^ 

ACTTACAATCAAGCTATGAAGGATTTTTTCAGTCGG^ 
CTTCATTTrTAAGTGAGGAGTTGAAAAAGT^^ 

k i ^ A f . * a rrrA a tt & orvrrr; a rcG^CTTGGTAGAGvjC/ vo^ AGGC 1. 1 1 UAVj l aj 1 _L ~* * * A ™~ 

actg^SSa^ctggtgctaaggatgtaaaaaca™ 

rt SSSKNRSERLGTELPFFIKSGVTIPKKILLIDDIYTrGATINRVKKLLEEAGAKDVK.rF5>LVRZ 



TD116 l2S»9bp 
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ATGAAAGTAAATTTAGATTATCTCGGTCGTrrATTTACTGAGAA^ 
A^AAA^^CCAGCAATGAGAAAGGAGAAGGGGAAACTTTTCT 

GG^^CC^GTGC^ 
5 rTATrTTCCGCAGGAGGATTTTCCAAAGCAAGATGTTCTC 

5 S^gTggg^g^ 

ACAGAAATGATTTATCAAGTAGTGGCTAAAGTGATCAATGCGGGTGGTGCAGTC^ 
ATGTrroTTn^GA^eTGTACAAGCGCCTGCAACAGGATTTTTC^GC^ 

AAGACTGAATITACCGAGACGGTTTCATGGAAATCCGTTGATrATTCCAAAACCAATTTGGTTATCGGAllll^l 
C^CTACTTA^3ACAAGAATCGTTTGTCACC 

a^Sa¥cactacgacaatcttggagcgcggagttacc^ 
atcgtttgt^^accaagtctagtttgattcagattggtggacgagt^ 

atttocttttcttoca^gatgggttaaatgc^ 

ggctggtctatga 

20 MXVNLDYLGRLFTENELTEEERQLAEKLPAMRXEKGKLFCQRCNST^ 
POEDITKODV^WRGQLTPF 

f^^ODFSCG^^GESEP^TPLVVATraQLU^^ 

25 S^fS 

RPTGDLLFFHDGLNASIKKAIKEIQMMNKEAGLZ 

IP117 870bp 

^0 ATGCAAATTCAAAAAAGTTTTAAGGGGCAGTCTCCCTATGGCAAGCTGTATCTAGTGGCAACGCC^ 



'45' 



CAGGGCTTTTGCTCAAGCATTTTGACATITCCACCAAGCAGATCAGTTTTCATGA 
TCCTCA^A^GG^ 

gaccctSSItc^^ 



35 E&KSSESSK^ 

rr Af^rACGT^GGAAAATATGTTAGAAGTCTAC 

ATGA^GAATAGCAAC^AG^TACTATCTCTGAGTTATTAGAAAGCA ... 

40 £atccI1^^^ 

' TrTA^GGTGCCTACCACGACTGGGAAGA^AXACAATAA . ^^ v .^ • - _ ^ ^ 



■ * * EDLFV^IQTRIQQGVKKNQAIKE ; ';. - ''\ ; ; . "V 

• " - 1D118 345bp V;:/ : , - ^-*;';' : v*^ T /^"-f ~ ~ .--V !. "V ■> ' • "" :r '-.^., v . x'/' 

50 • .^SbATAAA^ ^ 



' • "WcGAA'GATn'GTATGTI^GTGACGAGTTGT^TACAGGGAGTA^-^-'- — J'; — 




ID119 639bp . .... 

«W ,»'--^,J.»-^/^. ! s»-l^«2.:. '• ' - - • - -' •.' «.»<-K^r-^«Sr.T/ri-l^/ro^nrTRTGCf AC •.- .. • •-->-■-»• 




, A *ig?ssAS 



TrTfiTTGCCTATCAGGGATTTGGTCGTGGCTTAGATATTGAAGCC 
CCGCGAGGTTAATCGTTTGGATTTGGAAGGGTTGGACTTGCATAAAAAAGTTCGTC 

GTCTTGTTTGACGGAATGGGCTTGGCCAAATGA 
MSKGFLYSIJiGPEGAGKTSVLEALLPILEEKGVEVLTTREPGG 

K v LPALEAGKLVnviDi^iDSSVAYQGFGRGL^IEAI^Vv LNQr r\ .^Gi^Cx-^L i ~ * /^^-^^ ~ l — 
GLDLHKKVROGYLSLLDK£GNRiVKIDASLPLEQVVETTKAVLFDGMGLAKZ 

ID 120 408bp 

ATGGTAGAACAAAGAAAATCAATTACCATGAAAGATGTTGCTTT^ 
GTC^tcSS^A^^ 

CATTACCT'ATA^GCCAATTGAACATTACTTGACGTCAGGAATTCCCTTTGTTA 
ATTGCCATTCCTTGTGTTTCA 

MVFORKSITMKDVALEAGVSVGTVSRVINKEKGI^^ 
ID121 285bp 

======= 

MNIFRTKNVSLDKTEMHI^LKLWDLILLGIGAMVGTGVFTITGTAAATLAGPALVISIVISALCVGLSALFFAEFASRVP 
ATGGAYSYLYAILGEFPAWLAGWLTMMEFMTAISGVASG'WAAYF 

ID124 131 lbp 

i^rrrrTArnATTnGTAGCCGTCCT^ 

gaSSIg^ 

GCAACGCAGAACTTGATGATGGCAGCGACTCTGGCTGATGC^ 



r.^ATTGTTGACTTAGCCATTCTCOT^ 

a^aSgSg^SgaIaScatggtacgactca^ 



TTTATGG 

TAGCTGCTGCCATGACTGGTGGTGATGTOT^ 
GTTACT^GAAATCWGTGTTGAAGTAATTGAAGAAGACGAAGGAATTCGTGTTCGTTCTCA 



tSgc^aSc^ 

-^rAArTnArCTTCGTGCCAGTGCGGCCTTGATITTGACAGGTTTGGTAGCACAGGGAGAAACTGTGGTCGGTAAA 

CTGGTTCACTTG^ATA^AGGTTACTACCGTTrCCATGAGAAGTTGGCGCAGCTA 

AGGCAAGTGATGAAGATGAATAA 

MKSRVKETSMOKJVVQGOBhTRLVGSVTIEGAKNA 
EEAm.VKVPATGDITEEAPYKYVSKMRASlWLGP^^ 

A^RLHGAHlYMD^SVGATQNLMMAATLADGVTVlENAAREPE^LAr ^i^^.^^^SfpG 
TTHNVVQDRIEAGTFMVAAAMTGGDVLIRDAVWEHNRPLIAKLLEMGVEVIEED^ 

LVAQGETVVGKLVHL.DRGYYGFHEKLAQLGAK1QRIEASDEDEZ 
1D125 HOlbp 



0" 
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ATOTTATTAGCfflCAACAOTA^ 



CCTCAGGAAACGACTCCAGeTGAGAAGCAGGAAAcAGAAACAAG^XCTC 

10 i^gi^~^3H™=^^ 



======= 

15 TaACAOAAAATCACTATGATCACOTTCACGTrreAATtiAATOOATAA 



VALVTSSGNNVSMlJISJANMGQLTLGTQCQTVVVZQKriMrrFTFQZMD 



25 



30 



IDUfi 1281bp 



TCTATCTTTGGTAGGCGUA 1 uua ^ i u^m- l ww« ^ • - ^G^ T XTrCTTTrrrCAAGTCTACKn-ATTTrrGTrC 
35 ATCCACGTCTGCCTCTCTATGT^AG^^ 

\CTGT 

-GAGCGACTTATGATGA^^GCTCGAGATGTAGT 



TGGATATGGGGATAGCTGGTGTrGCTrG^GA^^GTGTCTC^ 
ATTAAAACTGCCTTATGGGAAGCCAACTTTTGJ^^ 



gCaAtgcaatcggagaaGtcttgacccagtttaact^^ 

„40 - GTTGGCCCGAGCACnXGGA^^ 
■ • '7 .... SNALNILFSSLAIFVLDMGIAGVAWGTn^ 



.. . LTHLYTTDi . s ; , _ , K . 




>»- • 
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GGTC^CC^AGCAAGTAGCCTATrATCAGGAACACTACGAAAAATTTGTCAAAAGTTAA 



™eSIvFHYLKGFALYQKGQCKEGCKQMQEAMHIFDVLGLPEQVa ywEhYBm- /fifc. 




TABLE 3 
ID1 1068bp 

<; A A An A T-ro A A A A ATGTrrrTnr.Ar.GACATCATGGGAGAGCGCTTTGGTCGCTACTCC.\AGTACATTATTC 

a1gacc^gc™5cagatattcgtg^ 
t!ScS?ac^gSgagctaccgtaagtcggccaagtcagtcgg 

rA^CCTCCCCATAATTTAGCT 

aaScItg^^^^^ 

G^TTAT^AGACT^GGGAAAGGGCGCGTGGTTGTTCGTTCCAAGACTGA^ 
1 S rAAATCGTTATTATTGAGATTCCTTATGAAATCAATAAGGCCAATCTAGTCAAGAAAATCGATGATGTTCGT 

ata^JSStILcVggg^^^ 

rAA^GA^GCTAATACTGAGCTTGTTCTC 
AGAAGTGA 

20 MSNIONMSLEDIMGERFGRYSKYUQDJflAIJDIRDGLKPVQRRILYSMNlQ3SNTFDKSYRKSAKSV 

25 tdlqinynfnmvaidnftprqvglfqsclajsltvekz 

ID 12 684bp 

atgccgacattagaaatagcacaaaaaaaactck^^ 

n iraiiTATAf AnTTnA^GGAGATAAACTAAAAGTAATTTCCGTTACnTCTGTTAACCCTGGGGAAGGAAAAACA 

a^ttS^a^^^ 
Sa^a^g^a^^ 

CA^C^TTGTTACAAA^TAAAAATTTTAATGATATGATTGA^ 

£agctgaggcgaat^cgt^^^ 
g^t^^tX^gatatctcggttaataagtat^ 

AA .. . 



•45 



40- .... MPTLEIAQ^EFIKSASEY^^^ 



iTQkCDASIbVTATGEANKRDIQKAKQQLKQTG , ' V : V- ^ -" ; ■. ■ 



TD13 1182bp 



ATGGAGGCAAATATGAAACATCTAA^ 
TtTTTAGT&GAGCCTTGGGTAGTTTTT 



' ' • - " " TAGTACTATTACACAAACTGCCTATAAGAACGAAAATTCAACAAC^vAOuc l o i 1 '^'^y^ . r~~™£™"JJ^ 

.. :•. X« r ":~ : , i» T ^CT AGTG AAGG^^ 




0>.M>-*i t^tgtgagtacaAgggacatgagaagag.^™..^ „.„ AA ;>, r4WrT 

C^GTAATATGCCTGCCA^^ _ , _ . ^ ^ 




ID15 939bp 

ATGGCAGAAATTTATCTAGCAGGTGGTTGTITITGGGGCCTAGAGGAATATT^ 

a!a?cagtgttggctacgctaatggtc^ 

AAACGGTCCAAGTGATTTACGATGAGAAGGAA^ 

TCCTCTATCTATCAATCAACAAGGGAATGACCGTGGTCGrc^^ 

GATTTGCCAGCTATCTACACAGTGGTGCAGGAGCAGGAA^ 

CAATTACGCCACTACATTCTGGCTGAAGACT^CCACCAAGACTATCT 

TCGATGTGACCGATGCTGATAAGCCA^GA^GATGCAGCAAAC^ 

CCAGTCTATCTGAAGAGTCTTATCGTGTCACAC^GAAGC^ 

GGCT ATCTATTGC CTTACTTAAAC AAATAA 
MAEIYLAGGCFWGLEEYFSRISG^ 

ID17 870bp 

atcttcaaattgaggtctcx:gaagaacgagatgagtg^^ 

ACGAGCGTAATCTCTTGCTCAAAA^ 
TGATGTCCCTTTGGCGCGCGGTITGGGTTCTTCCAGC^^^^ 
GGTCAACTCAACTTATCAGACCATGAAAAA^GCAG^^ 
CCAGCCATITATGGTAATCTCGTTATTGCAAGTTCTroTTGAAGGGC 

AGTGTGATirrCTAGCTTACATTCCAAACTA^^^^^ 

gtcttataack3aagctgttgctck:aagtct^^^ 

ACCGCTGGGCAAGCAATCGAGGG^ 

AGAGTTGATACCCAAGGTGTCCGTGTAGAAGCAAAATAA 
-MK^PATSANIGF^^^ 

HSSSSSSiSgA^^ 



YLSGAGPTVMV: 



LASHDKMPTIKAELEKQPFKGKI^ ' 



ID20 564bp 

ATOAAATATCACOATrACATC^ 

TTGAAACATrGGCACTGTATGGTAT^CACAA^^ 
TGCGATrGAGACATTCGCTCCCAATTTAGAGA^^ 
ACACCCGATTITATrTGAAGGAGTTrCTGACCTATrc 
CTCATCGAAATGATCAGGTTTTGGAAATTTTAG 

> TCGTG^TTTAAGACAAGTATTAGACATATAA ; • r -.rf", -v - : " 

IEAGQAAGLDTHLFTSIVNLRQVLDIZ . . : ... . ^ f , ^. 

ID21 1875bp -'"*•'"• ' . *. " '• * /. ■ < — 
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TTGATAACTCAATTGACGAGGCCTTGGCAGGATTTGCCAGCCATATTCA^ 

tactgttgtggatgatgggcgtggtatcccagtcgatattcaggaaaaaacaggccgtcctgctgttgagaccgtc 
tttacagtccttcacgctggaggaaagttcggcggtggtggatacaaggtttcaggtggtcttcacggggtggggt 

cgtcagtagttaatgccctttccactcaattagacgttcatgttc^ 
5 ccgtcgtggtcatgttgtcgcagatcttgaaatagttggagatacggataaaacaggaacaactgttcacttcaca 

ccggacccaaaaatcttcactgaaacaacaatctttgattttgataaa 

ttctaaatcgcggtcttcaaatttcaattacagataagcgccaaggtttggaacaaaccaagcattatca 
aggtgggattgctagttacgttgaatatatcaacgagaacaaggatgtaatctttgatacaccaatctatacagac 
ggtgagatggatgatatcacagttgaggtagccatgcagtacacaactggttaccatgaaaatgtcatgagtttc 
10 gccaataatattcatacccatgaaggtggaacacatgaacaaggtttccgtacagccttgacacgtgttatcaac 
gattatgctcgtaaaaataagttactgaaagacaatgaagataatttaacaggggaagatgttcgcgaaggctta 
actgcagttatctcagttaaacacccaaatccacagtttgaaggacaaaccaagaccaaattgggaaa 

GTGGTCAAGATTACCAATCGCCTCTTCAGTGAAGCTTTCTCCGATTTCCTCATGGAAAATCCACAGATTGCC 
GTATCGTAGAAAAAGGAATTTTGGCTGCCAAGGCTCGTGTGGCTGCCAAGCGTGCGCGTGAAGTCACACGTAAAA 
15 AATCTGGTTTGGAAATTTCCAACCTTCCAGGGAAACTAGCAGACTGTTCTTCTAATAACCCTGCTGAAACAGAACT 
CTTCATCGTCGAAGGAGACTCAGCTGGTGGATCAGCCAAATCTGGTCGTAACCGTGAGTTTCAGGCTATCCTTCCA 
ATTCGCGGTAAGATTTTGAACGTTGAAAAAGCAAGTATGGATAAGATTCTAGCCAA 

TCACAGCCATGGGAACAGGATTTGGCGCAGAATTTGATGTTTCGAAAGCCCGTTACCAAAAACTCGTTT^ 
CGATGCCGATGTCGATGGAGCCCACATTCGTACCCTTCTTTTAACCTTGATTTATCGTTATATGAAACCAATCCTA 
20 GAAGCTGGTTATGTTTATATTGCCCAACCACCAATCTATGGTGTCAAGGTTGGAAGCGAGArrAAAGAATATATCC 
AGCCGGGTGCAGATCAAGAAATCAAACTCCAAGAAGCTTTAGCCCGTTATAGTGAAGGTCGTACCAAACCGACTA 
TTCAGCGTTATAAGGGGCTAGGTGAAATGGACGATCATCAGCTGTGGGAAACAACCATGGATCCCGAACATCGCT 
TGATGGCTAGAGTTTCTGTAGATGATGTGCAGAAGCAGATAAAATCTTTGATATGTTGA 

25 MTEEIKNLQAQDYDASQIQVUiGLEAVRMRPGMYlGSTS 

DGRGIPVDIQEKTGRPAVETVTTVLHAGGKFGGGGYKVSGGIJiGVGS 
VADLEIVGDTDKTGTTVHFTPDPKIFTETTIFDFDKLNKRIQELAF^ 

NENKDVIFDTPIYTDGEMDDITVEVAMQYTTGYHENVMSFANNIHTHEGGTHEQGFRTALTO 
EDNLTGEDVREGLTAVISVKHPNPQFEGQTKT^ 
30 RAREVTRKKSGLEISNLPGKLADCSS^PAETELFiraGD^^ 

SLFTAMGTGFGAEFDVSKARYQKLVLMTDADVDGAHIRTLLLTLIYRYMKPILEAGYVYIAQPPIYGVKVGSEIKEY1QP 

GADQEIKXQEAIJVRYSEGRTKPTIQRYKGLGEMDDHQLWETTMDPEHRIJVIARVSVD 



35 



1D54 1446bp 



ATGAGTAGACGTTTTAAAAAATCACGTTCACAGAAAGT 
TATTGTTAGTTTGTTTTTTATTGTTCTTAATCTTTAAGTACAATATCCTTG 

CTGCGTTAGTCCTACTAGTTGCCTTGGTAGGGCTACTCTrGATTATCTATAAAAAAGCTGAAAAGTTTACTA 
CTGTTGGTGTTCTCTATCCTTGTCAGCTCTGTGTCGGTCTTTGCAGTACAGCAGTTTGTTGGACTGAGCAATGGm 
40 AAATGCGACTTCTAATTACTCAGAATATTCA 

~TLgAACTGACGAGTCTGACAGCACCGA^ , _ 

•--AGTCXdAATAtfCGATTixiAGG^ 



*\^s;\ v ;:\V^y-'?:''^ • ' vv- 1 ^aAAAgAttt- 
■' :z -- -745 ^ : v ^ j T< 



. v / / r t :',;«J caAAAAGATAAaCT * ? 

; — - — - . - ''tggATATCAXTTCXCTATGTGC '. -■' • 

, r ■ ' /.tataAtgat^^^ 

. * - .5-0w^- . -"i AGGCTCTCGGTTTTC ' — . 




i 
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ID55 732bp 

ATGATAGACATCCATTCGCATATCGTTTTTGATGTAGATGACGGTCCCAAGTCAAGAGAGGAAAGCAAGGCTCTCT 

TGGCAGAATCCTACAGACAGGGGGTGCGAACCATTGTTTCTACCTCTCACCGTCGCAAGGGCATGTTTGAAACTCC 

GGAAGAGAAGATAGCAGAAAACTTTCTTCAGGTTCGGGAAATAGCTAAGGAAGTGGCGAGTGACTTGGTCATTGC 

TTACGGGGCTGAAATTTATTACACACCAGATGTTCTGGATAAGCTGGAAAAAAAGCGGATTCCGACCCTCAATGA 

TAGTCGTTATGCCTTGATAGAGTTTAGTATGAACACTCCTTATCGCGATATTCATAGCGCCTTGAGCAAGATCTTG 

ATGTTGGGAATTACTCCAGTCATTGCCCACATTGAGCGCTATGATGCTC 

AACTGATCGATATGGGCTGTTACACGCAAGTAAATAGTTCACATGTCCTCAAACCCAAACTTTTTGGCGAACGTTA 
TAAATTCATGAAAAAAAGAGCTCAGTATTTTTTAGAGCAGGATTTGGTTCATGTCATTGCAA 

CTAGACGGTAGACCTCCTCATATGGCAGAAGCATATGACCTTGTTACCCAAAAATACGGAGAAGCGAAGGCTCAG 
GAACTTTITATAGACAATCCTCGAAAAATTGTAATGGATCAACTAATTTAG 

MIDIHSHIVFDVDDGPKSREESKALLAESYRQGVRTIVSTSHRRKGMF^^ 

YYTPDVLDKLEKKRIPTLNDSRYALffiFSMNTPYI^ 

NSSHVLKPKLFGERYKFMKKRAQYFLEQDLVHV^ 

DQLIZ 

ID58 3990bp 

TTGA.TTTATATAATCGCTATCAATATAACAATGCAATCAGG 
TTTCTATTCGTAAATACGCTGTAGGAGCAGCTTCTGTTCTAA 

CGATGGAGTTACTCCTACTACTACAGAAAACCAACCGACCATCCATACGGTTTCTGATTCCCCTCAATCATCCGAA 

AATCGGACTGAGGAAACACCTAAAGCAGTGCTTCAACCAGAAGCTCCAAAAACTGTAGAAACAGAAACTCCAGCT 

ACTGATAAGGTAGCTAGTCTTCCAAAAACAGAAGAAAAACCACAAGAGGAAGTTAGTTCAACTCCTAGTGATAAA 

GCAGAAGTGGTAACTCCAACTTCTGCTGAAAAAGAAACTGCTAATAAAAAGGCAGAAGAAGCTAGCCCTAAAAA 

GGAAGAAGCGAAAGAGGTTGATTCTAAAGAGTCAAATACAGACAAGACTGACAAGGATAAACCAGCTAAAAAAG 

ATGAAGCGAAAGCAGAGGCTGACAAACCGGCAACAGAGGCAGGAAAGGAACGTGCTGCAACTGTAAATGAAAAA 

CTAGCGAAAAAGAAAATTGTTTCTATTGATGCTGGACGTAAATATTTCTCACCAGAACAGCTCAAGGAAATCATC 

ATAAAGCGAAACATTATGGCTACACTGATTTACACCTATTAGTCGGAAATGATGGACTCCGTTTCATGTTGGACGA 

TATGAGCATCACAGCTAACGGCAAGACCTATGCCAGTGACGATGTCAAACGCGCCATTGAAAAAGGTACAAATGA 

TTATTACAACGATCCAAACGGCAATCACTTAACAGAAAGTCAAATGACAGATCTGATTAACTATGCCAAAGATAA 

AGGTATCGGTCTCATTCCGACAGTAAATAGTCCTGGACACATGGATGCGATTCTCAATGCCATGAAAGAATTGGG 

AATCCAAAACCCTAACTTTAGCTATTTTGGGAAGAAATCAGCCCGTACTGTCGATCTTGACAACGAACAAGCTGTC 

GCTTTTACAAAAGCCCTTATCGACAAGTATGCTGCTTATTTCGCGAAAAAGACTGAAATCT^ 

ATGAATATGCCAATGATGCGACAGATGCTAAAGGTTGGAGTGTGCTTCAAGCTGATAAATACTATCCAAACGAAG 
GCTACCCTGTAAAAGGCTATGAAAAATTTATTGCCTACGCCAATGACCTCGCTCGTA TTGTA AAATCGCACGGTCT 
CAAACCAATGGCTTTTAACGACGGTATCTACTACAATAGCGACACAAGCTTrGGTAGTTTTGACAAAGACATCATC 
GTTTCTATGTGGACTGGTGGTTGGGGAGGCTACGATGTCGCTTCTTCTAAACTACTAGCTGAAAAAGGTCACCAAA 
TCCTTAATACCAATGATGCTTGGTACTACGTTCTTGGACGAAACGCTGATGGCCAAGGCTGGTACAATCTCGATCA 
GGGGCTC^TGGTATTAAAAAC^ 

TGGTATGGTAGCTGCTTGGGtZTGACACTGCATCTGGACGTTATTCACCATCACGCCTC 
TTTGCAAATGCCAACGCTGAATACTTCGCAGCTGATTATG 

GACCTGAACCGTTATACTGCAGAAAGCGTCACGGCCGTAAAAGAAGCTGAAAAAGCTATTCGCTCTCTCGATAGC 

AACCTTAGCCGTGCCCAACAAGATACGATTGATCAAGCCATTGCTAAACTTCAAGAAACTGTCAACAACTTGACC 

CTCACGCCTGAAGCTCAAAAAGAAGAAGAAGCTAAACGTGAGGTTGAAAAACTTGCCAAAAACAAGGTAATCTCA 

ATCGATGCTGGACGCAAATACTTTACTCTGAACCAGCTCAAACGCATCGTAGACAAGGCCAGTGAGCTCGGATAT 

TCTGATGTCCATCTCCTTCTAGGAAATGACGGACTTCGCTITCTACTCGATGATATGACCATTACTGCCAACGGAA 

AAACCTATGCTAGTGATGACGTTAAAAAAGCTATTATCGAAGGAACTAAAGCTTACTACGACGATCCAAACGGTA 

CTGCACTAACACAGGCAGAAGTAACAGAGCTAATTGAATACGCTAAATCTAAGGACATCGGTCTCATCCCAGCTA 

TTAAGAGTCCAGGTCACATGGATGCTATGCTGGTTGCGATGGAAAAATTAGGTATTAAAAATCCTCAAGCCCACTT 

TGATAAAGTTTCAAAAACAACTATGGACCTGAAAAAGGAAGAAGCGATGAAGTTTGTAAAAGCCCTC 

ATACATGGAGTTCTTTGCAGGTAAAACAAAGATTTTCAACTTTGGTACTGACGAATAC 

GCCCAAGGCTGGTACTACCTCAAGTGGTATCAACTCTATGGCAAATTTGCCGAATATGGCAACACCCTCGCAGCTA 
iTGGGCAAAGAAAGAGGGCTTCAACCAATGGCCTTCAACGATGGCTTCT 

tTTGACAAAGATGTCTTGAm . 

■^\agcaaaggcta jaaattcttgaataccaacgigtgactgct 

* tggtftcctcaagaaagctXttc 
gtagatcttccaacagtcggaagtatgctttgaatctgggcagatagaccaagcgctgaatacaaggaagaggaa 

. atctitgaactcatgactgrctttgcagaccac^caaagactactttggtgctaattataatgo 

."- -aa.ttagctaaaattcctacaaacttagaa'ggatatagtaa^ 

ctctaaattacaacgtcaaccgtaataaacaagctjgagctt 

augcctcaaaccagctgtaaCtc^^ 

agaactcatcacaagaactgaagaaattccatttgaagtiatcaagaaa 



10 



30 



59 



GGAAAATATTATCACAGCAGGAGTCAAAGGTGAACGAACTCATTACATCTCTGTACTCACTGAAAATGGAAAAAC 

AACAGAAACAGTCCTTGATAGCCAGGTAACCAAAGAAGTTATAAACCAAGTGGTTGAAGTTGGCGCTCCTGTAAC 

TCACAAGGGTGATGAA.\GTGGTCTTGCACCAACTACTGAGGTAAAACCTAGACTGGATATCCAAGAAGAAGAAAT 

TCCATTTACCACAGTGACTTGTGAAAATCCACTCTTACTCAAAGGAAAAACACAAGTCATTACTAAGGGCGTCAAT 

GGACATCGTAGCAACTTCTACTCTGTGAGCACTTCTGCCGATGGTAAGGAAGTGAAAACACTTGTAAATAGTGTCG 

TAGCACAGGAAGCCGTTACTCAAATAGTCGAAGTCGGAACTATGGTAACACATGTAGGCGATGAAAACGGACAAG 

CCGCTATTGCTGAAGAAAAACCAAAACTAGAAATCCCAAGCCAACCAGCTCCATCAACTGCTCCTGCTGAGGAAA 

GCAAAGTTCTTCCTC AAGATCCAGCTCCTGTGGTAACAGAG.AAAAAACTTCCTGAAACAGGAACTC ACGATTCTG 

CAGGACTAGTAGTCGCAGGACTCATGTCCACACTAGCAGCCTATGGACTCAGTAAAAGAAAAGAAGACTAA 



MIYUAINITMQSGGFAMKJffiKQQRFSIRKYAVGAA 

TPKAVLQPEAPKTVETETPATDKVASLPKTEEKPQEEVSSTPSDKAEVVT^ 

SNTDKTDKDKPAKKDEAKAEADKPATEAGKERAATW 

VGNDGLRFMLDDMSITANGKTYASDDVKRAIEKGTO 
15' AILNAMKJELGIQNPNFSYFGKKSARTVDLDNEQAVAFT^^ 

DKYYPNEGYPVKGYEKFIAYANDLARIVKSHGLKPMAPKDGIYY^ 

EKGHQILNTNDAWYYVLGRNADGQGWYNLDQGLNGI^ 

MRHFANANAEYFAADYESAEQALNEVPKDLNRYTAESVT^^ 

LTPEAQKEEEAKREVTiKLAXNKVISIDAGRKYFTLNQL 
20 SDDVKXAIIEGTKAYYDDPNGTALTQAEVTELIEYAK^ 

MDLKNEEAiMNF^ALIGKYMDFFAGKTXIFNFGTO 

QPMAFNDGFYYEDKI>DVQFDKI>VLISYWSKG\VV/GY^^ 

ENTGKTPFNQLASTKYPEVDLPTVGSMLSIWADRPSAEYKEEEra 

YSKESLEALDAAKTALNYNLNRNKQAELDTLVANLKAALQGLKPAVTHSGSLDENEVAANVETRPELr^ 
25 KKENPNLPAGQENIITAGVKGERTHYISVLTENGKTTETVLDSQVTKJEVINQVVEVG 
DIQEEEIPFITVTCENPLLLKGKTQVITKGVNGHRSNFYSVSTSAD 
NGQAAIAEEKPKi-EIPSQPAPSTAPAEESKVLPQDPAPVVTEKKLPET 



ID122 825bp 



ATGAACAAAAAAACAAGACAGACACTAATCGGACTGCTAGTGTTATTGCTTTTGTCTACAGGGAGCTATTATATCA 
AGCAGATGCCGTCGGCACCTAATAGTCCCAAAACCAATCTTAGTCAGAAAAAACAAGCGTCTGAAGCTCCTAGTC 
AAGCATTGGCAGAGAGTGTCTTAACAGACGCAGTCAAGAGTCAAATAAAGGGGAGTCTGGAGTGGAATGGCTCAG 
GTGCTTTTATCGTCAATGGTAATAAAACAAATCTAGATGCCAAGGTTTCAAGTAAGCCCTACGCTGACAATAAAAC 

35 AAAGACAGTGGGCAAGGAAACTGTTCCAACCGTAGCTAATGCCCTCTTGTCTAAGGCCACTCGTCAGTACAAGAA 
TCGTAAAGAAACTGGGAATGGTTCAACTTCTTGGACTCCTCCAGGTTGGCATCAGGTCAAG AATCT AAAGGGCTCT 
TATACCCATGCAGTCGATAGAGGTCATTTGTTAGGCTATGCCTTAATCGGTGGTTTGGATGGTTTTGATGCCTCAA 
CAAGGAATCCTAAAAACATTGCTGTTCAGACAGCCTGGGCAAATCAGGCACAAGCCGAGTATTCGACTGGTCAAA 
ACTACTATGAAAGC AAGGTGCGTAAAGCCTO 

40 CTTCAjVACGAGGATTTAGTTCCCTCAGC^C 



.- >A.!",T"."- :'. . PSASQIEAKSSPGELEF>TVLWNVQKGLQLpYRTGEVTVTQZ * . . 



ID123 225bp 



:. ;.\ < V" . -50 ^ : -gxggtaAgattcagcgpatxqaggcaagt,gatoaag 
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CLAIMS: 

1 . A Streptococcus pneumoniae protein or polypeptide having a sequence 
selected from those shown in table 1 . 

5 

2. A Streptococcus pneumoniae protein or polypeptide having a sequence 
selected from those shown in table 2. 

3. A protein or polypeptide as claimed in claim 1 or claim 2 provided in 
1 0 substantially pure form. 

4. A protein or polypeptide which is substantially identical to one defined in any one 
of claims 1 to 3. 

15 5. A homologue or derivative of a protein or polypeptide as defined in any one of 
claims 1 to 4. 

6. An antigenic and/or immunogenic fragment of a protein or polypeptide as defined 
in Tables 1-3. , ■ , ■ - * 

20 

: . '. , 7 . A nucleic acid molecule comprising or consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 1 or their RNA equivalents; 

25 ■* (ii) a sequence which is complementary to any of the sequences of (i); 

,\ . (iii) , a sequence which codes for the same protein or polypeptide as those 
sequences of fi) or (ii); 



10 
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(iv) a sequence which is substantially identical with any of those of (i), (ii) 
and (iii); 

... (v) a sequence which codes for a homologue, derivative or, fragment -of a 
protein as defined in Table 1 . 

8. A nucleic acid molecule comprising or consisting of a sequence which is: 

(i) any of the DNA sequences set out in Table 2 or their RNA equivalents; 

(ii) a sequence which is complementary to any of the sequences of (i); 

(iii) a sequence which codes for the same protein or polypeptide, as those 
sequences of (i) or (ii); 

(iv) a sequence which is substantially identical with any of those of (i), (ii) 
and (iii); 

(v) _ ja sequence which^ codes_for a hqmologue, derivative or frag ment of a _ 

— * rv--. 20-,.' * < • - - --protein- as de^ed'-in.'Table : 2v' • ^?^v' ^ : >yV'': ^-7 ^TV "V .| : '^C . : ' 

.... ;V ^^••Ix^.'-v 9, \; The'use of afpibteiri or polypeptide having a sequence -sheeted from those . ; 
. . : ^ 1.-3., 6r homblogu0s, derivatives ^d/or ^fragments jher^ ; ;•' , -\ ;\r 

3, or homologues or derivatives thereof, and/pr fragi^nteof any. of these.^ _ ^ - - 

•*-"t«-S.«-^> 1- - .- Animmunogenic^nd/o^an^ in"claim 10' which-is v-- • ^* 



15 



a vaccine or is for use in a diagnostic assay. 



\j± iiiorc additional 



12. A vaccine as claimed in claim 1 1 which comprises one o 
components selected from excipients. diluents, adjuvants or th^ lik<j. 

13. A vaccine composition comprising one or more nucleic acid sequences as 
defined in Tables 1-3. 



14. A method for the detection/diagnosis of S.pneumoniae which comprises the 
10 step of bringing into contact a sample to be tested with at least one protein or 

polypeptide as defined in Tables 1-3, or homologue, derivative or fragment thereof. 

15. An antibody capable of binding to a protein or polypeptide as defined in Tables 
1 -3, or for a homologue, derivative or fragment thereof. 

15 

1 6. An antibody as defined in claim 1 5 which is a monoclonal antibody. 

1 7. A method for the detection/diagnosis of S.pneumoniae which comprises the step 
of bringing into contact a sample to be tested and at least one antibody as define din 

20 claim 1 5 or claim 1 6. 



18. A method for the detection/diagnosis of S.pneumoniae which comprises the 
step of bringing into contact a sample to be tested with at least one nucleic acid 
sequence as defined in claim 7 or claim 8. 

19. A method of determining whether a protein or polypeptide as defined in Tables 
1-3 presents a potential anti-microbiat target which comprises inactivating said 
protein or polypeptide and determining whether S.pneumoniae is still viable. 
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The use of an agent capable of antagonising, inhibiting or otherwise interfering 
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with the function or expression of a protein or polypeptide as defined in Tables 1 -3 in 
the manufacture of a medicament for use in the treatment or prophylaxis of 
S. pneumoniae infection 
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